通过本地损失扩展向前梯度

论文标题

通过本地损失扩展向前梯度

Scaling Forward Gradient With Local Losses

论文作者

Ren, Mengye, Kornblith, Simon, Liao, Renjie, Hinton, Geoffrey

论文摘要

向前梯度学习计算嘈杂的定向梯度，并且是学习深神经网络的生物学上合理的替代方案。然而，当天真地应用时，标准的正向梯度算法会遭受较高的差异，当要学习的参数数量很大时。在本文中，我们提出了一系列的架构和算法修改，共同使梯度学习实用，可用于标准的深度学习基准任务。我们表明，可以通过将扰动应用于激活而不是权重来实质上降低正向梯度估计器的方差。我们通过引入大量本地贪婪损失功能来进一步提高正向梯度的可扩展性，每种功能都只涉及少数可学习的参数，以及新的MLPMixer启发的架构LocalMixer，这更适合本地学习。我们的方法与MNIST和CIFAR-10上的反向版匹配，并且显着优于先前提出的ImageNet上的无反向推销算法。

Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. However, the standard forward gradient algorithm, when applied naively, suffers from high variance when the number of parameters to be learned is large. In this paper, we propose a series of architectural and algorithmic modifications that together make forward gradient learning practical for standard deep learning benchmark tasks. We show that it is possible to substantially reduce the variance of the forward gradient estimator by applying perturbations to activations rather than weights. We further improve the scalability of forward gradient by introducing a large number of local greedy loss functions, each of which involves only a small number of learnable parameters, and a new MLPMixer-inspired architecture, LocalMixer, that is more suitable for local learning. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题