论文标题

通过功率频谱密度分析计算洗牌随机梯度算法的方差

Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

论文作者

Domingo-Enrich, Carles

论文摘要

当解决有限和最小化问题时,具有理论上益处的随机梯度下降(SGD)的两个常见替代方法是随机改组(SGD-RR)和Shuffle-once(SGD-SO),其中在无需替代的循环中采样函数。在实验上保持的方便随机噪声近似下,我们研究了SGD,SGD-RR和SGD-SO的迭代率的固定差异,其领先术语在该顺序下降低并获得简单的近似值。为了获得我们的结果,我们研究了随机梯度噪声序列的功率谱密度。我们的分析以动量和随机Nesterov的加速梯度方法扩展到SGD。我们对二次目标函数进行实验,以测试我们近似和发现的正确性的有效性。

When solving finite-sum minimization problems, two common alternatives to stochastic gradient descent (SGD) with theoretical benefits are random reshuffling (SGD-RR) and shuffle-once (SGD-SO), in which functions are sampled in cycles without replacement. Under a convenient stochastic noise approximation which holds experimentally, we study the stationary variances of the iterates of SGD, SGD-RR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations. To obtain our results, we study the power spectral density of the stochastic gradient noise sequences. Our analysis extends beyond SGD to SGD with momentum and to the stochastic Nesterov's accelerated gradient method. We perform experiments on quadratic objective functions to test the validity of our approximation and the correctness of our findings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源