论文标题
通过时间动作解析的内部和互动理解
Intra- and Inter-Action Understanding via Temporal Action Parsing
论文作者
论文摘要
当前的动作识别方法主要依赖于深度卷积网络来得出视觉和运动特征的特征嵌入。尽管这些方法在标准基准上表现出了显着的性能,但我们仍然需要更好地了解视频,尤其是它们的内部结构如何与高级语义相关,这可能会导致多个方面的好处,例如。可解释的预测,甚至可以将识别性能提升到一个新级别的新方法。为了实现这一目标,我们构建了Tapos,这是一种在带有子行动手动注释的体育视频上开发的新数据集,并对最高的时间动作进行研究。我们的研究表明,运动活动通常由多个子行动组成,对这种时间结构的认识对行动识别有益。我们还研究了许多时间解析方法,因此设计了一种改进的方法,该方法能够从训练数据中挖掘亚措施而不知道其标签。在构造的tapos上,提出的方法显示出揭示行动内信息的信息,即如何制成子表演的行动实例,而行动信息信息,即通常在各种动作中出现一个特定的子行动。
Current methods for action recognition primarily rely on deep convolutional networks to derive feature embeddings of visual and motion features. While these methods have demonstrated remarkable performance on standard benchmarks, we are still in need of a better understanding as to how the videos, in particular their internal structures, relate to high-level semantics, which may lead to benefits in multiple aspects, e.g. interpretable predictions and even new methods that can take the recognition performances to a next level. Towards this goal, we construct TAPOS, a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top. Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition. We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them. On the constructed TAPOS, the proposed method is shown to reveal intra-action information, i.e. how action instances are made of sub-actions, and inter-action information, i.e. one specific sub-action may commonly appear in various actions.