论文标题
在潜在的重尾反馈下更好的可伸缩性
Better scalability under potentially heavy-tailed feedback
论文作者
论文摘要
我们研究可扩展梯度下降(RGD)技术的可伸缩替代方法,这些技术可以在损失和/或梯度被重尾时使用,尽管这对学习者来说是未知的。核心技术很简单:我们没有尝试在每个步骤上稳健地汇总梯度,这是昂贵的,并且会导致在风险界限中的次级尺寸依赖性,而是将计算努力集中在强有力地选择(或新构建)的强大候选人基于一系列可以平行运行的强大候选人。确切的选择过程取决于基本目标的凸度,但是在所有情况下,我们的选择技术都相当于增强弱学习者的信心的强大形式。除了正式的保证外,我们还提供了对扰动性扰动的实证分析,以及在次高斯和重尾数据下的实验条件,以及对各种基准数据集的应用。总体外卖是一个可扩展的过程,易于实现,微不足道,可以使RGD方法的正式优点保持良好,但可以更好地缩放到大型学习问题。
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we instead focus computational effort on robustly choosing (or newly constructing) a strong candidate based on a collection of cheap stochastic sub-processes which can be run in parallel. The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners. In addition to formal guarantees, we also provide empirical analysis of robustness to perturbations to experimental conditions, under both sub-Gaussian and heavy-tailed data, along with applications to a variety of benchmark datasets. The overall take-away is an extensible procedure that is simple to implement, trivial to parallelize, which keeps the formal merits of RGD methods but scales much better to large learning problems.