早期行动预测的丰富动作语义一致知识

论文标题

早期行动预测的丰富动作语义一致知识

Rich Action-semantic Consistent Knowledge for Early Action Prediction

论文作者

Liu, Xiaoli, Yin, Jianqin, Guo, Di, Liu, Huaping

论文摘要

早期行动预测（EAP）的目的是在正在进行的视频中从行动执行的一部分中识别人类行动，这对于许多实际应用来说是一项重要任务。大多数先前的作品将部分或完整的视频视为一个整体，忽略了视频中隐藏的丰富动作知识，即不同部分视频中的语义一致性。相比之下，我们将原始部分或完整视频分开，以形成一系列新的部分视频，并在这些新的部分视频中挖掘行动语义一致的知识（ASCK），这些视频以任意进度的水平发展。此外，在EAP提出了一个新颖的动作语义一致的知识网络（RACK）。首先，我们使用两条预训练的模型来提取视频的功能。其次，我们将部分视频的RGB或流动特征视为节点及其动作语义一致性作为边缘。接下来，我们为教师网络构建了双向语义图，并为学生网络为学生网络建立了一个单向语义图，以在部分视频中为Rich ASCK建模。 MSE和MMD的损失被纳入了我们的蒸馏损失，以使老师从老师到学生网络的部分视频的ASCK损失。最后，我们通过将不同子网的逻辑列入并应用SoftMax层来获得最终的预测。已经进行了广泛的实验和烧蚀研究，证明了为EAP建模丰富的ASCK的有效性。借助拟议的机架，我们在三个基准测试中实现了最先进的性能。该代码可在https://github.com/lily2lab/rack.git上找到。

Early action prediction (EAP) aims to recognize human actions from a part of action execution in ongoing videos, which is an important task for many practical applications. Most prior works treat partial or full videos as a whole, ignoring rich action knowledge hidden in videos, i.e., semantic consistencies among different partial videos. In contrast, we partition original partial or full videos to form a new series of partial videos and mine the Action-Semantic Consistent Knowledge (ASCK) among these new partial videos evolving in arbitrary progress levels. Moreover, a novel Rich Action-semantic Consistent Knowledge network (RACK) under the teacher-student framework is proposed for EAP. Firstly, we use a two-stream pre-trained model to extract features of videos. Secondly, we treat the RGB or flow features of the partial videos as nodes and their action semantic consistencies as edges. Next, we build a bi-directional semantic graph for the teacher network and a single-directional semantic graph for the student network to model rich ASCK among partial videos. The MSE and MMD losses are incorporated as our distillation loss to enrich the ASCK of partial videos from the teacher to the student network. Finally, we obtain the final prediction by summering the logits of different subnetworks and applying a softmax layer. Extensive experiments and ablative studies have been conducted, demonstrating the effectiveness of modeling rich ASCK for EAP. With the proposed RACK, we have achieved state-of-the-art performance on three benchmarks. The code is available at https://github.com/lily2lab/RACK.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题