EGOLOC：通过视觉查询重新访问以egintric视频的重新访问3D对象本地化

论文标题

EGOLOC：通过视觉查询重新访问以egintric视频的重新访问3D对象本地化

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

论文作者

Mai, Jinjie, Hamdi, Abdullah, Giancola, Silvio, Zhao, Chen, Ghanem, Bernard

论文摘要

随着视频和3D理解的最新进展，新颖的4D时空方法融合了这两个概念。朝这个方向迈出，EGO4D情节内存基准提出了3D定位（VQ3D）的视觉查询任务。给定一个以自我为中心的视频剪辑和描绘查询对象的图像裁剪，目标是将查询对象的中心的3D位置定位，相对于查询框架的相机姿势。当前方法通过将二维定位（VQ2D）的同胞任务视觉查询的2D定位结果取消到3D预测来解决VQ3D的问题。然而，我们指出，由以前的VQ3D方法重新定位引起的相机姿势的数量少，从而严重阻碍了其整体成功率。在这项工作中，我们正式化了一条管道（我们配音Egoloc），该管道更好地纠缠了3D多浏览几何形状，并从中心视频中检索2D对象。我们的方法涉及通过利用2D检测置信度来估算更健壮的相机姿势并汇总多视图3D位移，从而提高了对象查询的成功率，并导致VQ3D基线性能显着改善。具体而言，我们的方法达到的总体成功率高达87.12％，这在VQ3D任务中设定了新的最新结果。我们对VQ3D任务和现有解决方案提供了全面的经验分析，并强调了VQ3D中其余的挑战。该代码可在https://github.com/wayne-mai/egoloc上找到。

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with 3D Localization (VQ3D). Given an egocentric video clip and an image crop depicting a query object, the goal is to localize the 3D position of the center of that query object with respect to the camera pose of a query frame. Current methods tackle the problem of VQ3D by unprojecting the 2D localization results of the sibling task Visual Queries with 2D Localization (VQ2D) into 3D predictions. Yet, we point out that the low number of camera poses caused by camera re-localization from previous VQ3D methods severally hinders their overall success rate. In this work, we formalize a pipeline (we dub EgoLoc) that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. Our approach involves estimating more robust camera poses and aggregating multi-view 3D displacements by leveraging the 2D detection confidence, which enhances the success rate of object queries and leads to a significant improvement in the VQ3D baseline performance. Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task. We provide a comprehensive empirical analysis of the VQ3D task and existing solutions, and highlight the remaining challenges in VQ3D. The code is available at https://github.com/Wayne-Mai/EgoLoc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题