论文标题
单眼BEV对路景的感知通过前视图投影
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection
论文作者
论文摘要
高清图重建对于自主驾驶至关重要。由于昂贵的传感器和耗时的计算,基于激光雷达的方法受到限制。基于摄像机的方法通常需要进行道路细分并分别查看转换,这通常会导致失真和缺失的内容。为了突破技术的极限,我们提出了一个新颖的框架,该框架仅在鸟眼视图中由路线布局和车辆占用而形成的本地地图仅给出了前视图单眼图像。我们提出了一个前视图投影(FTVP)模块,该模块会考虑到视图之间的循环一致性的约束,并充分利用它们的相关性来增强视图转换和场景的理解。此外,我们还应用多尺度的FTVP模块来传播低级特征的丰富空间信息,以减轻预测对象位置的空间偏差。公共基准测试的实验表明,我们的方法在道路布局估计,车辆占用估计和多级语义估计的任务中实现了最新的性能。特别是,对于多级语义估计,我们的模型以大幅度的优于所有竞争对手。此外,我们的模型在单个GPU上以25 fps的速度运行,该GPU有效且适用于实时全景HD地图重建。
HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to expensive sensors and time-consuming computation. Camera-based methods usually need to perform road segmentation and view transformation separately, which often causes distortion and missing content. To push the limits of the technology, we present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only. We propose a front-to-top view projection (FTVP) module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. In addition, we also apply multi-scale FTVP modules to propagate the rich spatial information of low-level features to mitigate spatial deviation of the predicted object location. Experiments on public benchmarks show that our method achieves the state-of-the-art performance in the tasks of road layout estimation, vehicle occupancy estimation, and multi-class semantic estimation. For multi-class semantic estimation, in particular, our model outperforms all competitors by a large margin. Furthermore, our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.