旋转参加：卷积三胞胎注意模块

论文标题

旋转参加：卷积三胞胎注意模块

Rotate to Attend: Convolutional Triplet Attention Module

论文作者

Misra, Diganta, Nalamada, Trikay, Arasanipalai, Ajay Uppili, Hou, Qibin

论文摘要

从渠道或空间位置之间建立相互依存关系的能力中，人们最近在各种计算机视觉任务中进行了广泛的研究并广泛使用了注意机制。在本文中，我们研究了轻巧但有效的注意机制和当前的三重注意力，这是一种通过使用三分支结构捕获跨维相互作用来计算注意力重量的新方法。对于输入张量，TripleT注意通过旋转操作进行旋转操作，然后进行残余转换，并用可忽略的计算开销来编码渠道间和空间信息。我们的方法既简单又有效，并且可以轻松插入经典的骨干网络作为附加模块。我们证明了我们的方法对各种具有挑战性的任务的有效性，包括ImageNet-1K上的图像分类以及MSCOCO和PASCAL VOC数据集的对象检测。此外，我们通过视觉检查Gradcam和Gradcam ++的结果，可以全面地了解三重态注意力的表现。对我们方法的经验评估支持我们对计算注意力重量时捕获依赖关系的重要性的直觉。可以在https://github.com/landskapeai/triplet-prestion上公开访问本文的代码

Benefiting from the capability of building inter-dependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety of computer vision tasks recently. In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure. For an input tensor, triplet attention builds inter-dimensional dependencies by the rotation operation followed by residual transformations and encodes inter-channel and spatial information with negligible computational overhead. Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module. We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive in-sight into the performance of triplet attention by visually inspecting the GradCAM and GradCAM++ results. The empirical evaluation of our method supports our intuition on the importance of capturing dependencies across dimensions when computing attention weights. Code for this paper can be publicly accessed at https://github.com/LandskapeAI/triplet-attention

下载PDF全文

下载文献需遵守相关版权规定

论文标题