Understanding real-world videos with complex semantics and long temporal dependencies remains a fundamental challenge in computer vision. Recent progress in multimodal large language models (MLLMs) ...
Abstract: Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained ...
https://huggingface.co/hkuds/OpenCity-Plus It's the model weights of our OpenCity-Plus. https://huggingface.co/datasets/hkuds/OpenCity-dataset/tree/main We released ...
Abstract: Dual-arm robots can perform bimanual long-horizon (LH) manipulation, surpassing the capabilities of single-arm robots. However, bimanual LH tasks are challenging for robot intelligence due ...