论文标题
为什么要“成长”和“收获”深度学习模型?
Why to "grow" and "harvest" deep learning models?
论文作者
论文摘要
以基于梯度的方法培训深度学习模型的当前期望包括:1)透明度; 2)高收敛速度; 3)高电感偏见。尽管具有自适应学习率时间表的最先进方法很快,但它们仍然无法满足其他两个要求。我们建议根据单个物种种群动态重新考虑神经网络模型,在这些动态中,适应自然来自“增长”和“收获”的开放式过程。我们表明,具有两个平衡的预定义值的随机梯度下降(SGD)的人均增长和收获率值优于所有三个要求中最常见的自适应梯度方法。
Current expectations from training deep learning models with gradient-based methods include: 1) transparency; 2) high convergence rates; 3) high inductive biases. While the state-of-art methods with adaptive learning rate schedules are fast, they still fail to meet the other two requirements. We suggest reconsidering neural network models in terms of single-species population dynamics where adaptation comes naturally from open-ended processes of "growth" and "harvesting". We show that the stochastic gradient descent (SGD) with two balanced pre-defined values of per capita growth and harvesting rates outperform the most common adaptive gradient methods in all of the three requirements.