论文标题

黑盒概括:零订单学习的稳定性

Black-Box Generalization: Stability of Zeroth-Order Learning

论文作者

Nikolakakis, Konstantinos E., Haddadpour, Farzin, Kalogerias, Dionysios S., Karbasi, Amin

论文摘要

我们通过无衍生的优化为黑盒学习提供了第一个概括分析。在Lipschitz和光滑未知损失的假设下,我们考虑了零级随机搜索(ZOSS)算法,通过更新$ d $维的模型,通过替换$ K+K+1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1的损失评估的模型。对于无界和边界的可能的非凸损失,我们介绍了Zoss算法的第一个概括界限。这些界限与SGD的界限一致,令人惊讶的是,在适当的学习率略有下降的情况下,与$ d $,$ k $和批次尺寸$ m $相吻合。对于有限的非凸损失和批处理尺寸$ m = 1 $,我们还表明,概括错误和学习率都独立于$ d $和$ k $,并且与SGD基本相同,即使对于两个功能评估也是如此。我们的结果广泛扩展并始终如一地恢复了先前工作中SGD的既定结果,均以泛化范围和相应的学习速率恢复。如果另外$ m = n $,其中$ n $是数据集大小,我们也可以为全批量GD提供概括保证。

We provide the first generalization error analysis for black-box learning through derivative-free optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a $d$-dimensional model by replacing stochastic gradient directions with stochastic differences of $K+1$ perturbed loss evaluations per dataset (example) query. For both unbounded and bounded possibly nonconvex losses, we present the first generalization bounds for the ZoSS algorithm. These bounds coincide with those for SGD, and rather surprisingly are independent of $d$, $K$ and the batch size $m$, under appropriate choices of a slightly decreased learning rate. For bounded nonconvex losses and a batch size $m=1$, we additionally show that both generalization error and learning rate are independent of $d$ and $K$, and remain essentially the same as for the SGD, even for two function evaluations. Our results extensively extend and consistently recover established results for SGD in prior work, on both generalization bounds and corresponding learning rates. If additionally $m=n$, where $n$ is the dataset size, we derive generalization guarantees for full-batch GD as well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源