论文标题
学习点云的潜在零件整体层次结构
Learning Latent Part-Whole Hierarchies for Point Clouds
论文作者
论文摘要
有力的证据表明,人类通过将视觉场景和物体解析为部分层次结构来感知3D世界。尽管深度神经网络具有学习强大的多级表示的能力,但它们无法明确对零件整体层次结构进行建模,从而限制了它们在处理3D视觉数据(例如点云)中的表现力和解释性。为此,我们提出了一个编码器样式的潜在变量模型,该模型明确学习了多级点云分割的零件层次结构。具体而言,编码器将点云作为输入,并预测中间级别的每个点潜在子部分分布。解码器将潜在变量和来自编码器的特征作为输入,并预测顶级的每点零件分布。在培训期间,仅提供顶级注释的零件标签,从而使整个框架受到弱监督。我们探讨了两种近似推理算法,即最概率的lantent和蒙特卡洛方法,以及三种学习离散潜在变量的随机梯度估计,即直线直通,增强和路径估计器。 Partnet数据集上的实验结果表明,所提出的方法不仅在顶级部分分段,而且还达到了中级潜在子部分细分。
Strong evidence suggests that humans perceive the 3D world by parsing visual scenes and objects into part-whole hierarchies. Although deep neural networks have the capability of learning powerful multi-level representations, they can not explicitly model part-whole hierarchies, which limits their expressiveness and interpretability in processing 3D vision data such as point clouds. To this end, we propose an encoder-decoder style latent variable model that explicitly learns the part-whole hierarchies for the multi-level point cloud segmentation. Specifically, the encoder takes a point cloud as input and predicts the per-point latent subpart distribution at the middle level. The decoder takes the latent variable and the feature from the encoder as an input and predicts the per-point part distribution at the top level. During training, only annotated part labels at the top level are provided, thus making the whole framework weakly supervised. We explore two kinds of approximated inference algorithms, i.e., most-probable-latent and Monte Carlo methods, and three stochastic gradient estimations for learning discrete latent variables, i.e., straight-through, REINFORCE, and pathwise estimators. Experimental results on the PartNet dataset show that the proposed method achieves state-of-the-art performance in not only top-level part segmentation but also middle-level latent subpart segmentation.