论文标题
casapose:自适应和语义意识的多对象姿势估计
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
论文作者
论文摘要
在增强现实或机器人技术领域的应用通常需要联合定位,并需要对多个对象进行6D姿势估算。但是,大多数算法每个对象类都需要培训一个网络,以便提供最佳结果。分析所有可见对象需要多种推论,这是内存和耗时的。我们提出了一种称为CASAPOSE的新的单阶段架构,该体系结构确定了一个通行证中RGB图像中多个不同对象的姿势估算的2D-3D对应关系。它是快速和内存有效的,并且通过利用语义分割解码器的输出作为控制输入,可以通过局部类自动自适应归一化来实现多个对象的高精度。我们对关键点位置的新的可区分回归显着有助于更快地缩小实际测试和合成训练数据之间的域间隙。我们采用分割意识的卷积和提高采样操作,以增加对象掩模内部的焦点并减少遮挡对象的相互干扰。对于每个插入的对象,网络仅通过一个输出分割映射和一个可忽略的参数生长。我们在挑战多对象场景的情况下,通过对象间的遮挡和合成训练,我们的表现优于最先进的方法。
Applications in the field of augmented reality or robotics often require joint localisation and 6D pose estimation of multiple objects. However, most algorithms need one network per object class to be trained in order to provide the best results. Analysing all visible objects demands multiple inferences, which is memory and time-consuming. We present a new single-stage architecture called CASAPose that determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects by exploiting the output of a semantic segmentation decoder as control input to a keypoint recognition decoder via local class-adaptive normalisation. Our new differentiable regression of keypoint locations significantly contributes to a faster closing of the domain gap between real test and synthetic training data. We apply segmentation-aware convolutions and upsampling operations to increase the focus inside the object mask and to reduce mutual interference of occluding objects. For each inserted object, the network grows by only one output segmentation map and a negligible number of parameters. We outperform state-of-the-art approaches in challenging multi-object scenes with inter-object occlusion and synthetic training.