论文标题
Intervideo-ego4d:一包EGO4D挑战的冠军解决方案
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
论文作者
论文摘要
在本报告中,我们在EGO4D挑战赛上介绍了冠军解决方案。我们利用开发的Intervideo(视频基础模型)来完成五个EGO4D任务,包括力矩查询,自然语言查询,未来的手预测,状态变化对象检测和短期对象互动预期。 Intervideo-ego4d是一种有效的范式,可将强大的基础模型调整为以简单的头部设计的下游以自我为中心的视频理解任务。在这五个任务中,Intervideo-ego4D的性能全面超过了基线方法和CVPR2022的拥护者,这表明了InternVideo作为视频基础模型的强大表示能力。我们的代码将在https://github.com/opengvlab/ego4d-eccv2022-solutions上发布
In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions