旨在改善视频中的时空动作识别

论文标题

旨在改善视频中的时空动作识别

Towards Improving Spatiotemporal Action Recognition in Videos

论文作者

Mo, Shentong, Tan, Xiaoqing, Xia, Jingfei, Ren, Pinxu

论文摘要

时空动作识别涉及视频中的定位和分类动作。由您只观看一次（Yowo）的最新最新实时对象检测器的动机，我们旨在修改其结构以提高动作检测精度并减少计算时间。具体来说，我们提出了四种新颖的方法，试图通过修改损失功能来改善Yowo并解决视频中的不平衡类问题。我们考虑使用两个中等尺寸的数据集来应用我们对Yowo的修改 - 流行的联合通知的人类运动数据库（J-HMDB-21）和由卡内基·梅隆大学（Carnegie Mellon University）提供的Agot.ai提供的私人数据集。后者涉及与小对象以及不平衡数据类别的快速动作，使行动本地化任务更具挑战性。我们在github存储库中实现了我们建议的方法https://github.com/stonemo/yowov2。

Spatiotemporal action recognition deals with locating and classifying actions in videos. Motivated by the latest state-of-the-art real-time object detector You Only Watch Once (YOWO), we aim to modify its structure to increase action detection precision and reduce computational time. Specifically, we propose four novel approaches in attempts to improve YOWO and address the imbalanced class issue in videos by modifying the loss function. We consider two moderate-sized datasets to apply our modification of YOWO - the popular Joint-annotated Human Motion Data Base (J-HMDB-21) and a private dataset of restaurant video footage provided by a Carnegie Mellon University-based startup, Agot.AI. The latter involves fast-moving actions with small objects as well as unbalanced data classes, making the task of action localization more challenging. We implement our proposed methods in the GitHub repository https://github.com/stoneMo/YOWOv2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题