在潜在的重尾反馈下更好的可伸缩性

论文标题

在潜在的重尾反馈下更好的可伸缩性

Better scalability under potentially heavy-tailed feedback

论文作者

Holland, Matthew J.

论文摘要

我们研究可扩展梯度下降（RGD）技术的可伸缩替代方法，这些技术可以在损失和/或梯度被重尾时使用，尽管这对学习者来说是未知的。核心技术很简单：我们没有尝试在每个步骤上稳健地汇总梯度，这是昂贵的，并且会导致在风险界限中的次级尺寸依赖性，而是将计算努力集中在强有力地选择（或新构建）的强大候选人基于一系列可以平行运行的强大候选人。确切的选择过程取决于基本目标的凸度，但是在所有情况下，我们的选择技术都相当于增强弱学习者的信心的强大形式。除了正式的保证外，我们还提供了对扰动性扰动的实证分析，以及在次高斯和重尾数据下的实验条件，以及对各种基准数据集的应用。总体外卖是一个可扩展的过程，易于实现，微不足道，可以使RGD方法的正式优点保持良好，但可以更好地缩放到大型学习问题。

We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we instead focus computational effort on robustly choosing (or newly constructing) a strong candidate based on a collection of cheap stochastic sub-processes which can be run in parallel. The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners. In addition to formal guarantees, we also provide empirical analysis of robustness to perturbations to experimental conditions, under both sub-Gaussian and heavy-tailed data, along with applications to a variety of benchmark datasets. The overall take-away is an extensible procedure that is simple to implement, trivial to parallelize, which keeps the formal merits of RGD methods but scales much better to large learning problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题