3D Convnet的优化计划

论文标题

3D Convnet的优化计划

Optimization Planning for 3D ConvNets

论文作者

Qiu, Zhaofan, Yao, Ting, Ngo, Chong-Wah, Mei, Tao

论文摘要

由于训练方案的高复杂性和各种选择，因此最好地学习3D卷积神经网络（3D Convnets）并不是一件容易的事。最常见的手工调整过程始于使用短视频剪辑学习3D Convnets，然后使用冗长的剪辑来学习长期的时间依赖，同时随着培训的进展，学习率从高到低衰减。这种过程与多种启发式设置相同的事实激发了研究，寻求最佳的“途径”来自动化整个培训。在本文中，我们将路径分解为一系列训练“状态”，并指定了每个状态下的学习率和输入夹的长度。对性能 - 期间曲线上膝关节的估计触发了从一个状态到另一个状态的过渡。我们对所有候选状态进行动态编程，以计划状态的最佳排列，即优化路径。此外，我们设计了一个新的3D回动，具有独特的双头分类器设计，以改善空间和时间歧视。对七个公共视频识别基准测试的广泛实验证明了我们提案的优势。通过优化计划，与最新的识别方法相比，我们的3D Convnets取得了卓越的结果。更值得注意的是，我们在动力学400和Kinetics-600数据集上获得了80.5％和82.7％的前1个精度。源代码可从https://github.com/zhaofanqiu/optimization-planning-for-3d-convnets获得。

It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal dependency using lengthy clips, while gradually decaying the learning rate from high to low as training progresses. The fact that such process comes along with several heuristic settings motivates the study to seek an optimal "path" to automate the entire training. In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e.g., learning rate and the length of input clips, in each state. The estimation of the knee point on the performance-epoch curve triggers the transition from one state to another. We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path. Furthermore, we devise a new 3D ConvNets with a unique design of dual-head classifier to improve spatial and temporal discrimination. Extensive experiments on seven public video recognition benchmarks demonstrate the advantages of our proposal. With the optimization planning, our 3D ConvNets achieves superior results when comparing to the state-of-the-art recognition methods. More remarkably, we obtain the top-1 accuracy of 80.5% and 82.7% on Kinetics-400 and Kinetics-600 datasets, respectively. Source code is available at https://github.com/ZhaofanQiu/Optimization-Planning-for-3D-ConvNets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题