球形变压器

论文标题

Spherical Transformer

论文作者

Cho, Sungmin, Jung, Raehyuk, Kwon, Junseok

论文摘要

使用卷积神经网络进行360图，可以由于平面投影所带来的扭曲而引起的次优性能。将旋转应用于360图像时，失真会恶化。因此，许多基于卷积的研究试图减少扭曲以学习准确表示。相比之下，我们利用变压器体系结构解决360图的图像分类问题。将拟议的变压器用于360Images具有两个优点。首先，我们的方法不需要通过从球体表面采样像素来进行错误的平面投影过程。其次，我们基于常规多面体的抽样方法会导致较低的旋转肩rive率误差，因为可以将特定的旋转减少为面部排列。在实验中，我们在两个方面验证我们的网络，如下所示。首先，我们表明，使用具有高度均匀采样方法的变压器可以帮助减少失真。其次，我们证明了变压器架构可以实现特定旋转的旋转率。我们使用SPH-MNIST，SPH-CIFAR和SUN360数据集将我们的方法与其他最先进的算法进行比较，并表明我们的方法与其他方法具有竞争力。

Using convolutional neural networks for 360images can induce sub-optimal performance due to distortions entailed by a planar projection. The distortion gets deteriorated when a rotation is applied to the 360image. Thus, many researches based on convolutions attempt to reduce the distortions to learn accurate representation. In contrast, we leverage the transformer architecture to solve image classification problems for 360images. Using the proposed transformer for 360images has two advantages. First, our method does not require the erroneous planar projection process by sampling pixels from the sphere surface. Second, our sampling method based on regular polyhedrons makes low rotation equivariance errors, because specific rotations can be reduced to permutations of faces. In experiments, we validate our network on two aspects, as follows. First, we show that using a transformer with highly uniform sampling methods can help reduce the distortion. Second, we demonstrate that the transformer architecture can achieve rotation equivariance on specific rotations. We compare our method to other state-of-the-art algorithms using the SPH-MNIST, SPH-CIFAR, and SUN360 datasets and show that our method is competitive with other methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题