论文标题
探索自我发作的视觉交叉分类
Exploring Self-Attention for Visual Intersection Classification
论文作者
论文摘要
在机器人视觉中,自我注意力最近成为捕获非本地环境的一种技术。在这项研究中,我们将一个自我注意力的机制引入了交叉识别系统中,作为一种捕获幕后非本地环境的方法。相交分类系统包含两个独特的模块:(a)第一人称视觉(FPV)模块,该模块在传递相交时使用短的中心视图序列,(b)第三人称视觉(TPV)模块,该模块在输入相交之前使用单个视图。自我发挥的机制在TPV模块中是有效的,因为局部模式的大多数(例如,道路边缘,建筑物和天空)彼此相似,因此使用非本地环境(例如,在交叉路口周围两个对角线角之间的角度)是有效的。这项研究做出了三个主要贡献。首先,我们提出了一种基于自我注意的方法,用于使用TPV进行交叉分类。其次,我们提出了一个实用的系统,在该系统中,基于自发的TPV模块与FPV模块相结合,以提高整体识别性能。最后,使用公共KITTI数据集的实验表明,基于自我注意力的系统优于基于本地模式和基于卷积操作的识别的常规识别。
In robot vision, self-attention has recently emerged as a technique for capturing non-local contexts. In this study, we introduced a self-attention mechanism into the intersection recognition system as a method to capture the non-local contexts behind the scenes. An intersection classification system comprises two distinctive modules: (a) a first-person vision (FPV) module, which uses a short egocentric view sequence as the intersection is passed, and (b) a third-person vision (TPV) module, which uses a single view immediately before entering the intersection. The self-attention mechanism is effective in the TPV module because most parts of the local pattern (e.g., road edges, buildings, and sky) are similar to each other, and thus the use of a non-local context (e.g., the angle between two diagonal corners around an intersection) would be effective. This study makes three major contributions. First, we proposed a self-attention-based approach for intersection classification using TPVs. Second, we presented a practical system in which a self-attention-based TPV module is combined with an FPV module to improve the overall recognition performance. Finally, experiments using the public KITTI dataset show that the above self-attention-based system outperforms conventional recognition based on local patterns and recognition based on convolution operations.