无监督的时间功能聚合，用于在非结构化体育视频中进行事件检测

论文标题

无监督的时间功能聚合，用于在非结构化体育视频中进行事件检测

Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos

论文作者

Chaudhury, Subhajit, Kimura, Daiki, Vinayavekhin, Phongtharin, Munawar, Asim, Tachibana, Ryuki, Ito, Koji, Inaba, Yuki, Matsumoto, Minoru, Kidokoro, Shuji, Ozaki, Hiroki

论文摘要

基于图像的运动分析能够自动检索游戏中的关键事件，从而加快了人类专家的分析过程。但是，大多数现有的方法都集中在结构化电视广播视频数据集上，其直接和固定相机的捕获姿势具有最小的可变性。在本文中，我们研究了体育视频中事件检测的情况，该案例针对具有任意相机角度的非结构化环境。从结构化到非结构化视频分析的过渡产生了我们在论文中解决的多重挑战。具体而言，我们识别并解决了两个主要问题：在非结构化设置和对训练的模型的概括中，无监督的识别，由于任意拍摄角而构成变化。对于第一个问题，我们使用人员重新识别功能提出了一种时间功能聚合算法，以通过提高弱启发式评分方法来获得高播放器检索精度。此外，我们提出了基于多模式图像翻译模型的数据增强技术，以减少训练样品出现的偏差。实验评估表明，我们提出的方法可提高玩家检索的精度，从0.78到0.86，对于倾斜的视频。此外，使用我们提出的播放器级功能，我们从0.79到0.89，在乒乓球视频中从0.79获得F1得分的RALLY检测提高。请参阅https://ibm.biz/bdzeza的补充视频提交。

Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environments with arbitrary camera angles. The transition from structured to unstructured video analysis produces multiple challenges that we address in our paper. Specifically, we identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles. For the first problem, we propose a temporal feature aggregation algorithm using person re-identification features to obtain high player retrieval precision by boosting a weak heuristic scoring method. Additionally, we propose a data augmentation technique, based on multi-modal image translation model, to reduce bias in the appearance of training samples. Experimental evaluations show that our proposed method improves precision for player retrieval from 0.78 to 0.86 for obliquely angled videos. Additionally, we obtain an improvement in F1 score for rally detection in table tennis videos from 0.79 in case of global frame-level features to 0.89 using our proposed player-level features. Please see the supplementary video submission at https://ibm.biz/BdzeZA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题