论文标题
通过内核近似找到全局最小值
Finding Global Minima via Kernel Approximations
论文作者
论文摘要
我们仅根据功能评估来考虑平滑函数的全局最小化。达到给定精度级别的最佳函数评估数量的算法通常依赖于明确构建功能的近似值,然后用具有指数运行时复杂性的算法将其最小化。在本文中,我们考虑了一种将函数共同模拟近似并找到全球最小值的方法。这是通过使用无限的平滑函数和与多项式平方层层次结构的牢固联系来完成的。利用繁殖内核希尔伯特空间的最新表示特性,可以通过在函数评估的数量中进行多项式来解决无限二维优化问题,并在理论上保证获得的最小值。 给定$ n $样品,时间的计算成本为$ o(n^{3.5})$,$ o(n^2)$在太空中,我们达到了融合率,达到$ o(n^{ - m/d + 1/2 + 1/2 + 3/d})$ o o(n^{ - m/d + 1/2 + 3/d})$,$ m $是功能和$ d $ d $ demensions $ distance and dymensions的差异性。对于Sobolev函数的情况,该速率几乎是最佳的,并且通常使所提出的方法特别适合具有大量衍生物的功能。的确,当$ m $按$ d $的订单为单位时,与全球最佳最佳的收敛速率不会遭受维数的诅咒,这仅影响最差的常数(我们通过纸张明确跟踪)。
We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum. Given $n$ samples, the computational cost is $O(n^{3.5})$ in time, $O(n^2)$ in space, and we achieve a convergence rate to the global optimum that is $O(n^{-m/d + 1/2 + 3/d})$ where $m$ is the degree of differentiability of the function and $d$ the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions that have a large number of derivatives. Indeed, when $m$ is in the order of $d$, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).