论文标题

重新参数化优化器,而不是体系结构

Re-parameterizing Your Optimizers rather than Architectures

论文作者

Ding, Xiaohan, Chen, Honghao, Zhang, Xiangyu, Huang, Kaiqi, Han, Jungong, Ding, Guiguang

论文摘要

神经网络中精心设计的结构反映了模型中的先验知识。但是,尽管不同的模型具有不同的先验,但我们习惯于使用诸如SGD等模型的优化器训练它们。在本文中,我们建议通过根据一组模型特定的超参数修改梯度来将模型特定的先验知识纳入优化者。这种方法被称为梯度重新参数化,优化器被命名为repoptimizizer。对于模型结构的极端简单性,我们专注于VGG风格的普通模型,并展示了这样一种简单的模型,该模型训练了重新示例器(被称为Repopt-vgg),与最近设计良好的模型相比或更好。从实际的角度来看,repopt-vgg是一个有利的基础模型,因为其简单的结构,高推理速度和训练效率。与结构性重新参数化相比,通过构建额外的训练时间结构将先验添加到模型中,重新定位器不需要额外的前进/向后计算并解决量化问题。我们希望能够超越模型结构设计领域的进一步研究。代码和模型\ url {https://github.com/dingxiaoh/repoptimizers}。

The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers such as SGD. In this paper, we propose to incorporate model-specific prior knowledge into optimizers by modifying the gradients according to a set of model-specific hyper-parameters. Such a methodology is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and training efficiency. Compared to Structural Re-parameterization, which adds priors into models via constructing extra training-time structures, RepOptimizers require no extra forward/backward computations and solve the problem of quantization. We hope to spark further research beyond the realms of model structure design. Code and models \url{https://github.com/DingXiaoH/RepOptimizers}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源