论文标题
离散优化与优化限制时间序列回归和连续归一化流量
Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows
论文作者
论文摘要
我们比较了使用神经ODE的时间序列回归和连续归一化流(CNF)的离散化优化(DISC-OPT),并优化了Discretize(OPT-DISC)方法。神经ODE是具有神经网络成分的普通微分方程(ODE)。训练神经ode是一个最佳控制问题,其中权重为控件,而隐藏的特征是状态。每个训练迭代都涉及求解ode前进,而另一个迭代时间则落后,这可能需要大量的计算,时间和内存。比较图像分类任务中的OPT-DISC和DISC-OPT方法,Gholami等人。 (2019年)表明,由于梯度的保证准确性,盘子是优选的。在本文中,我们将比较扩展到时间序列回归和CNF的神经ODES。与分类不同,这些任务中的有意义的模型还必须满足除准确的最终时间输出(例如CNF的可逆性)之外的其他要求。通过我们的数值实验,我们证明,通过仔细的数值处理,Disc-Opt方法可以在推断训练成本大幅度降低时获得与Opt-DISC相似的性能。 Disc-Opt在七个单独的问题中六个单独的问题的成本降低了,培训时间的缩短从39%降低到97%,在一种情况下,Disc-Opt将培训从9天减少到不到一天。
We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Neural ODEs are ordinary differential equations (ODEs) with neural network components. Training a neural ODE is an optimal control problem where the weights are the controls and the hidden features are the states. Every training iteration involves solving an ODE forward and another backward in time, which can require large amounts of computation, time, and memory. Comparing the Opt-Disc and Disc-Opt approaches in image classification tasks, Gholami et al. (2019) suggest that Disc-Opt is preferable due to the guaranteed accuracy of gradients. In this paper, we extend the comparison to neural ODEs for time-series regression and CNFs. Unlike in classification, meaningful models in these tasks must also satisfy additional requirements beyond accurate final-time output, e.g., the invertibility of the CNF. Through our numerical experiments, we demonstrate that with careful numerical treatment, Disc-Opt methods can achieve similar performance as Opt-Disc at inference with drastically reduced training costs. Disc-Opt reduced costs in six out of seven separate problems with training time reduction ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.