跨颞空间路径网络

论文标题

跨颞空间路径网络

Blockwise Temporal-Spatial Pathway Network

论文作者

Hong, SeulGi, Choi, Min-Kook

论文摘要

视频行动识别算法不仅应考虑空间信息，还应考虑暂时关系，这仍然具有挑战性。我们提出了一个基于3D-CNN的动作识别模型，称为Blockwise时间空间路径网络（BTSNET），该模型可以通过多个途径来调整时间和空间接受场。我们设计了一个受自适应内核选择模型启发的新型模型，该模型是一种用于自适应编码的有效特征的体系结构，可自适应地选择用于图像识别的空间接收场。将这种方法扩展到时间领域，我们的模型提取了时间和渠道的关注，并融合了有关各种候选操作的信息。为了进行评估，我们在UCF-101，HMDB-51，SVW和Epic-Kitchen数据集上测试了我们提出的模型，并表明它在没有训练的情况下进行了很好的概括。 BTSNET还基于时空通道的关注提供了可解释的可视化。我们确认，基于此可视化的跨度时间空间途径为3D卷积块提供更好的表示。

Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题