Understanding real-world videos with complex semantics and long temporal dependencies remains a fundamental challenge in computer vision. Recent progress in multimodal large language models (MLLMs) ...
https://huggingface.co/hkuds/OpenCity-Plus It's the model weights of our OpenCity-Plus. https://huggingface.co/datasets/hkuds/OpenCity-dataset/tree/main We released ...
Abstract: Dual-arm robots can perform bimanual long-horizon (LH) manipulation, surpassing the capabilities of single-arm robots. However, bimanual LH tasks are challenging for robot intelligence due ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results