与对比的蒸馏是您需要自我监督点云表示学习所需的全部

论文标题

与对比的蒸馏是您需要自我监督点云表示学习所需的全部

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

论文作者

Fu, Kexue, Gao, Peng, Zhang, Renrui, Li, Hongsheng, Qiao, Yu, Wang, Manning

论文摘要

在本文中，我们为自我监督点云表示学习提出了一个简单而通用的框架。人类通过提取两个级别的信息并建立它们之间的关系来理解3D世界。一个是对象的全局形状，另一个是其局部结构。但是，很少有现有的研究云表示学习探讨了如何在没有指定的网络体系结构的情况下学习全球形状和局部到全球关系。受人类如何理解世界的启发，我们利用知识蒸馏来学习全球形状信息以及全球形状与本地结构之间的关系。同时，我们将对比度学习与知识蒸馏相结合，以使教师网络更好地更新。我们的方法实现了线性分类和其他多个下游任务的最新性能。尤其是，我们开发了3D点云特征提取的VIT的变体，当与我们的框架结合使用时，这也可以与现有的骨干可相当，并且注意力图的可视化表明，我们的模型确实通过将全局形状信息和多个局部结构信息结合在一起，这与我们的表示学习方法的灵感一致。我们的代码将很快发布。

In this paper, we propose a simple and general framework for self-supervised point cloud representation learning. Human beings understand the 3D world by extracting two levels of information and establishing the relationship between them. One is the global shape of an object, and the other is the local structures of it. However, few existing studies in point cloud representation learning explored how to learn both global shapes and local-to-global relationships without a specified network architecture. Inspired by how human beings understand the world, we utilize knowledge distillation to learn both global shape information and the relationship between global shape and local structures. At the same time, we combine contrastive learning with knowledge distillation to make the teacher network be better updated. Our method achieves the state-of-the-art performance on linear classification and multiple other downstream tasks. Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method. Our code will be released soon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题