GSNET：通过几何和场景感知监督的联合车辆姿势和形状重建

论文标题

GSNET：通过几何和场景感知监督的联合车辆姿势和形状重建

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision

论文作者

Ke, Lei, Li, Shichao, Sun, Yanan, Tai, Yu-Wing, Tang, Chi-Keung

论文摘要

我们提出了一个名为GSNET（几何和场景感知网络）的新颖端到端框架，该框架共同估计了6DOF姿势并从单个城市街景视图中重建了详细的3D汽车形状。 GSNET利用独特的四向特征提取和融合方案，并直接在单个正向通道中直接回归6DOF姿势和形状。广泛的实验表明，我们多样化的特征提取和融合方案可以大大提高模型性能。 GSNet基于划分和框架3D形状表示策略，重建了具有详细细节的3D车辆形状（1352个顶点和2700个面孔）。这种密集的网格表示进一步导致我们考虑几何一致性和场景上下文，并激发了新的多目标损耗功能以正规化网络训练，从而提高了6D姿势估计的准确性并验证了共同执行这两项任务的优点。我们在最大的多任务apollocar3d基准上评估GSNET，并在定量和定性上实现最先进的性能。项目页面可从https://lkeab.github.io/gsnet/获得。

We present a novel end-to-end framework named as GSNet (Geometric and Scene-aware Network), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view. GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass. Extensive experiments show that our diverse feature extraction and fusion scheme can greatly improve model performance. Based on a divide-and-conquer 3D shape representation strategy, GSNet reconstructs 3D vehicle shape with great detail (1352 vertices and 2700 faces). This dense mesh representation further leads us to consider geometrical consistency and scene context, and inspires a new multi-objective loss function to regularize network training, which in turn improves the accuracy of 6D pose estimation and validates the merit of jointly performing both tasks. We evaluate GSNet on the largest multi-task ApolloCar3D benchmark and achieve state-of-the-art performance both quantitatively and qualitatively. Project page is available at https://lkeab.github.io/gsnet/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题