论文标题
大型模型可以预测时间:通用性属性和平均案例分析
Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis
论文作者
论文摘要
平均情况分析计算在所有可能输入上平均算法的复杂性。与最坏情况分析相比,它更多地代表了算法的典型行为,但在优化中仍未探索。一个困难是分析可以取决于输入到模型的概率分布。但是,我们表明,对于一类大规模的问题,这些问题并非如此,这些大规模问题接受了一阶方法,包括随机权重的随机最小二乘和一个隐藏的层神经网络。实际上,停止时间具有普遍性:它与概率分布无关。由于消除了该障碍以进行平均案例分析,我们提供了第一个显式平均案例收敛速率,显示传统最坏情况分析未捕获的更复杂性。最后,数值模拟表明这种通用性属性具有更一般的算法和问题类别。
Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems.