论文标题
A3D:用于视频动作识别的自适应3D网络
A3D: Adaptive 3D Networks for Video Action Recognition
论文作者
论文摘要
本文介绍了A3D,这是一种自适应3D网络,可以在一次性培训的情况下在广泛的计算限制中推断出A3D。它没有以网格搜索方式训练多个模型,而是通过在网络宽度和时空分辨率之间进行交易来生成良好的配置。此外,在部署模型以满足可变约束(例如在边缘设备上)之后,可以对计算成本进行调整。即使在相同的计算限制下,我们的自适应网络的性能也可以通过沿三个维度的相互训练在基线对应物上显着提高。当多个路径框架时,例如采用了慢速,我们的自适应方法鼓励与手动设计相比,在途径之间取舍更好。动力学数据集的广泛实验显示了所提出的框架的有效性。还验证了性能增益以在数据集和任务之间很好地传输。代码将提供。
This paper presents A3D, an adaptive 3D network that can infer at a wide range of computational constraints with one-time training. Instead of training multiple models in a grid-search manner, it generates good configurations by trading off between network width and spatio-temporal resolution. Furthermore, the computation cost can be adapted after the model is deployed to meet variable constraints, for example, on edge devices. Even under the same computational constraints, the performance of our adaptive networks can be significantly boosted over the baseline counterparts by the mutual training along three dimensions. When a multiple pathway framework, e.g. SlowFast, is adopted, our adaptive method encourages a better trade-off between pathways than manual designs. Extensive experiments on the Kinetics dataset show the effectiveness of the proposed framework. The performance gain is also verified to transfer well between datasets and tasks. Code will be made available.