基于骨架的动作识别的时空元素变压器

论文标题

基于骨架的动作识别的时空元素变压器

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

论文作者

Qiu, Helei, Hou, Biao, Ren, Bo, Zhang, Xiaohua

论文摘要

在基于骨架的动作识别任务中，捕获关节之间的依赖关系至关重要。变压器具有建模重要关节相关性的巨大潜力。但是，现有的基于变压器的方法无法捕获框架之间不同关节的相关性，因为相关框架之间的不同身体部位（例如“跳远跳远”中的手臂和腿）之间的相关性非常有用。专注于这个问题，提出了一种新型的时空元素变压器（Sttformer）方法。将骨架序列分为几个部分，每个部分中包含的几个连续帧都被编码。然后提出了一个时空元组自我发场模块，以捕获连续帧中不同关节的关系。此外，在非贴剂框架之间引入了特征聚合模块，以增强区分相似动作的能力。与最先进的方法相比，我们的方法在两个大规模数据集上实现了更好的性能。

Capturing the dependencies between joints is critical in skeleton-based action recognition task. Transformer shows great potential to model the correlation of important joints. However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and legs in "long jump") between adjacent frames move together. Focus on this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is proposed. The skeleton sequence is divided into several parts, and several consecutive frames contained in each part are encoded. And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. In addition, a feature aggregation module is introduced between non-adjacent frames to enhance the ability to distinguish similar actions. Compared with the state-of-the-art methods, our method achieves better performance on two large-scale datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题