论文标题
以不同的节奏识别视频事件
Recognizing Video Events with Varying Rhythms
论文作者
论文摘要
最近在长期复杂的视频中识别视频事件,最近受到了多个子活性的关注。与传统的动作识别相比,这项任务更具挑战性,并具有相对均匀的视频剪辑。在本文中,我们调查了以不同的行动节奏识别长而复杂的事件的问题,这在文献中尚未考虑,而是一个实际的挑战。我们的工作的一部分是由于人类如何以不同的节奏来识别事件:迅速捕捉到对特定事件的最大贡献的框架。我们提出了一个两阶段\ emph {端到端}框架,其中第一阶段选择了最重要的帧,而第二阶段则使用选定的帧识别事件。我们的模型在训练阶段仅需要\ emph {事件级标签},因此,当缺失或难以获得次级活动标签时,更实用。广泛实验的结果表明,即使测试视频遭受了严重的节奏变化,我们的模型也可以从长期视频中实现事件识别的显着改善。这证明了我们方法对基于实际视频的应用程序的潜力,在该应用程序中,测试和培训视频在亚活动的节奏上可能会有很大的不同。
Recognizing Video events in long, complex videos with multiple sub-activities has received persistent attention recently. This task is more challenging than traditional action recognition with short, relatively homogeneous video clips. In this paper, we investigate the problem of recognizing long and complex events with varying action rhythms, which has not been considered in the literature but is a practical challenge. Our work is inspired in part by how humans identify events with varying rhythms: quickly catching frames contributing most to a specific event. We propose a two-stage \emph{end-to-end} framework, in which the first stage selects the most significant frames while the second stage recognizes the event using the selected frames. Our model needs only \emph{event-level labels} in the training stage, and thus is more practical when the sub-activity labels are missing or difficult to obtain. The results of extensive experiments show that our model can achieve significant improvement in event recognition from long videos while maintaining high accuracy even if the test videos suffer from severe rhythm changes. This demonstrates the potential of our method for real-world video-based applications, where test and training videos can differ drastically in rhythms of sub-activities.