优化器合并

论文标题

Optimizer Amalgamation

论文作者

Huang, Tianshu, Chen, Tianlong, Liu, Sijia, Chang, Shiyu, Amini, Lisa, Wang, Zhangyang

论文摘要

对于研究人员和从业者，为给定问题选择适当的优化器是主要兴趣。已经提出了使用多种理论和经验方法提出的许多分析优化器。但是，没有人能比其他竞争优化者提供普遍的优势。因此，我们有动力研究一个名为“优化器合并”的新问题：如何最好地将“老师”优化器池组合到一个可以具有更强特定问题的性能的单个“学生”优化器中？在本文中，我们从“学习优化”领域中汲取灵感，以使用可学习的合并目标。首先，我们定义了三种可区分的合并机制，以通过梯度下降将分析优化库融合在一起。然后，为了减少融合过程的方差，我们还通过扰动融合目标来探索稳定融合过程的方法。最后，我们介绍了与合并的组件相比，显示了合并优化器的优势，并学习优化基本线以及降低差异扰动的功效。我们的代码和预培训模型可在http://github.com/vita-group/optimizeramalgamation上公开获得。

Selecting an appropriate optimizer for a given problem is of major interest for researchers and practitioners. Many analytical optimizers have been proposed using a variety of theoretical and empirical approaches; however, none can offer a universal advantage over other competitive optimizers. We are thus motivated to study a new problem named Optimizer Amalgamation: how can we best combine a pool of "teacher" optimizers into a single "student" optimizer that can have stronger problem-specific performance? In this paper, we draw inspiration from the field of "learning to optimize" to use a learnable amalgamation target. First, we define three differentiable amalgamation mechanisms to amalgamate a pool of analytical optimizers by gradient descent. Then, in order to reduce variance of the amalgamation process, we also explore methods to stabilize the amalgamation process by perturbing the amalgamation target. Finally, we present experiments showing the superiority of our amalgamated optimizer compared to its amalgamated components and learning to optimize baselines, and the efficacy of our variance reducing perturbations. Our code and pre-trained models are publicly available at http://github.com/VITA-Group/OptimizerAmalgamation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题