Point3D：用3D CNN跟踪动作作为移动点

论文标题

Point3D：用3D CNN跟踪动作作为移动点

Point3D: tracking actions as moving points with 3D CNNs

论文作者

Mo, Shentong, Xia, Jingfei, Tan, Xiaoqing, Raj, Bhiksha

论文摘要

时空行动识别是一项艰巨的任务，涉及检测行动发生的时间和何时发生。当前的最新动作探测器主要基于锚固，需要敏感的锚点设计和大量计算，这是由于计算大量锚箱而进行的。通过新生的无锚方法，我们提出了Point3D，这是一个灵活且计算上有效的网络，具有很高的时空作用识别精度。我们的Point3d包括一个要动作定位的要点和一个3D负责人进行行动分类。首先，点头用于跟踪中心点和打结人类的关键点，以定位动作的边界框。然后将这些位置功能输送到时间的关注中，以学习跨帧的长距离依赖性。后来将3D头部部署进行最终动作分类。我们的Point3D在JHMDB，UCF101-24和AVA基准上实现了最先进的性能，从帧映射和视频映射方面。全面的消融研究还证明了我们的Point3d中提出的每个模块的有效性。

Spatio-temporal action recognition has been a challenging task that involves detecting where and when actions occur. Current state-of-the-art action detectors are mostly anchor-based, requiring sensitive anchor designs and huge computations due to calculating large numbers of anchor boxes. Motivated by nascent anchor-free approaches, we propose Point3D, a flexible and computationally efficient network with high precision for spatio-temporal action recognition. Our Point3D consists of a Point Head for action localization and a 3D Head for action classification. Firstly, Point Head is used to track center points and knot key points of humans to localize the bounding box of an action. These location features are then piped into a time-wise attention to learn long-range dependencies across frames. The 3D Head is later deployed for the final action classification. Our Point3D achieves state-of-the-art performance on the JHMDB, UCF101-24, and AVA benchmarks in terms of frame-mAP and video-mAP. Comprehensive ablation studies also demonstrate the effectiveness of each module proposed in our Point3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题