论文标题
非凸优化的副本交换
Replica Exchange for Non-Convex Optimization
论文作者
论文摘要
已知梯度下降(GD)会迅速收敛以获得凸目标函数,但可以将其捕获在局部最小值上。另一方面,Langevin Dynamics(LD)可以探索状态空间并找到全局最小值,但是为了给出准确的估计,LD需要以较小的离散步长和弱随机力进行运行,通常,这会减慢其收敛性。本文表明,这两种算法及其非交换变体。可以通过一种简单的交换机制``合作'',如果LD产生较低的目标功能,它们可以交换当前的位置。可以将其视为从样本文献中复制 - 交换技术的单一限制。我们表明,通过高概率,该新算法与全球最小的概率汇聚在一起,可以通过高范围来替换目标,从而使目标逐渐稳定地置于附近,以实现目标,并在附近的范围内进行了统一的范围。随机梯度,在交换机制中添加适当的阈值,我们的算法也可以在在线设置中使用。
Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima. On the other hand, Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and weak stochastic force, which in general slow down its convergence. This paper shows that these two algorithms and their non-swapping variants. can ``collaborate" through a simple exchange mechanism, in which they swap their current positions if LD yields a lower objective function. This idea can be seen as the singular limit of the replica-exchange technique from the sampling literature. We show that this new algorithm converges to the global minimum linearly with high probability, assuming the objective function is strongly convex in a neighborhood of the unique global minimum. By replacing gradients with stochastic gradients, and adding a proper threshold to the exchange mechanism, our algorithm can also be used in online settings. We also study non-swapping variants of the algorithm, which achieve similar performance. We further verify our theoretical results through some numerical experiments and observe superior performance of the proposed algorithm over running GD or LD alone.