论文标题
深度自适应神经网络从最佳控制观点
Depth-Adaptive Neural Networks from the Optimal Control viewpoint
论文作者
论文摘要
近年来,深度学习已与最佳控制联系起来,以定义连续的潜在学习问题的概念。在这种观点中,神经网络可以解释为参数的普通微分方程的离散化,该方程在极限上定义了连续的深入神经网络。然后,学习任务包括为正在考虑的问题找到最佳的ODE参数,并且它们的数量随时间离散的准确性而增加。尽管已经采取了重要的步骤来实现这种连续配方的优势,但大多数当前的学习技术固定了离散化(即固定层的数量)。在这项工作中,我们提出了一种迭代自适应算法,在该算法中,我们逐步完善时间离散化(即增加层的数量)。只要在迭代中满足某些公差,我们就证明该策略会融合到潜在的连续问题。这种浅至深度方法的一个显着优势是,它通过缓解过度参数化问题有助于从深层网络的较高近似属性中受益。在几个数值示例中说明了该方法的性能。
In recent years, deep learning has been connected with optimal control as a way to define a notion of a continuous underlying learning problem. In this view, neural networks can be interpreted as a discretization of a parametric Ordinary Differential Equation which, in the limit, defines a continuous-depth neural network. The learning task then consists in finding the best ODE parameters for the problem under consideration, and their number increases with the accuracy of the time discretization. Although important steps have been taken to realize the advantages of such continuous formulations, most current learning techniques fix a discretization (i.e. the number of layers is fixed). In this work, we propose an iterative adaptive algorithm where we progressively refine the time discretization (i.e. we increase the number of layers). Provided that certain tolerances are met across the iterations, we prove that the strategy converges to the underlying continuous problem. One salient advantage of such a shallow-to-deep approach is that it helps to benefit in practice from the higher approximation properties of deep networks by mitigating over-parametrization issues. The performance of the approach is illustrated in several numerical examples.