学习模棱两可的表示

论文标题

学习模棱两可的表示

Learning Equivariant Representations

论文作者

Esteves, Carlos

论文摘要

最先进的深度学习系统通常需要大量的数据和计算。因此，利用数据的已知或未知结构至关重要。卷积神经网络（CNN）是该原理的成功例子，其定义特征是换档等级。通过在输入上移动过滤器时，当输入移动时，响应会移动相同的量，利用语义内容独立于绝对像素位置的自然图像的结构。此属性对于CNN在音频，图像和视频识别任务中的成功至关重要。在本论文中，我们将均等性扩展到其他类型的转换，例如旋转和缩放。我们提出了对对称组定义的不同变换的模型模型。主要贡献是（i）极地变压器网络，达到了飞机上相似性的等效性，（ii）等效的多视图网络，达到与Icosahedron的对称性组的等效性，（III）球形CNN，实现了持续的3D旋转3D旋转（IIC），（III）达到等值的等值。至2D输入的3D旋转，以及（v）自旋加权球形CNN，概括了球形CNN，并实现了球形矢量场的3D旋转。应用包括图像分类，3D形状分类和检索，全景图像分类和分割，形状比对和姿势估计。这些模型的共同点是，它们利用数据中的对称性来降低样本和建模复杂性并改善概括性能。在数据有限或输入扰动（例如任意旋转）的挑战性任务上（但不限于）具有挑战性的任务更为重要（但不限于）。

State-of-the-art deep learning systems often require large amounts of data and computation. For this reason, leveraging known or unknown structure of the data is paramount. Convolutional neural networks (CNNs) are successful examples of this principle, their defining characteristic being the shift-equivariance. By sliding a filter over the input, when the input shifts, the response shifts by the same amount, exploiting the structure of natural images where semantic content is independent of absolute pixel positions. This property is essential to the success of CNNs in audio, image and video recognition tasks. In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling. We propose equivariant models for different transformations defined by groups of symmetries. The main contributions are (i) polar transformer networks, achieving equivariance to the group of similarities on the plane, (ii) equivariant multi-view networks, achieving equivariance to the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving equivariance to the continuous 3D rotation group, (iv) cross-domain image embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v) spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving equivariance to 3D rotations for spherical vector fields. Applications include image classification, 3D shape classification and retrieval, panoramic image classification and segmentation, shape alignment and pose estimation. What these models have in common is that they leverage symmetries in the data to reduce sample and model complexity and improve generalization performance. The advantages are more significant on (but not limited to) challenging tasks where data is limited or input perturbations such as arbitrary rotations are present.

下载PDF全文

下载文献需遵守相关版权规定

论文标题