可扩展的一阶贝叶斯通过结构化自动分化优化

论文标题

可扩展的一阶贝叶斯通过结构化自动分化优化

Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation

论文作者

Ament, Sebastian, Gomes, Carla

论文摘要

贝叶斯优化（BO）对评估昂贵的功能的全球优化表现出了巨大的希望，但是尽管取得了许多成功，但标准方法仍可能在高维度上挣扎。为了提高BO的性能，先前的工作建议将梯度信息纳入目标的高斯流程替代，从而产生了$ n $ nd $ nd $ nd $ nd $的内核矩阵。这些矩阵需要$ \ MATHCAL {o}（n^2d^2）$（分别$ \ $ \ MATHCAL {O}（n^3d^3 $））操作，这些矩阵需要$ \ MATHCAL {o}（n^2d^2）$haïvely倍增，这对于中等尺寸和样本量变得不可避免。在这里，我们观察到，各种核会产生结构化的矩阵，从而使梯度观测值的精确$ \ MATHCAL {O}（n^2d）$矩阵 - 矢量乘数和$ \ Mathcal {O}（O}（n^2d^2）$用于Hessian Vistervations。除了规范内核类别外，我们还得出了一种程序化方法，用于利用这种类型的结构进行讨论的内核类别的转换和组合，该类别构成了一种结构感知的自动分化算法。我们的方法几乎适用于所有规范内核，并自动扩展到复杂的内核，例如神经网络，径向基函数网络和光谱混合物内核，而无需任何其他推导，可以在将一阶BO缩放到高$ d $的同时，启用灵活的，问题依赖性的建模。

Bayesian Optimization (BO) has shown great promise for the global optimization of functions that are expensive to evaluate, but despite many successes, standard approaches can struggle in high dimensions. To improve the performance of BO, prior work suggested incorporating gradient information into a Gaussian process surrogate of the objective, giving rise to kernel matrices of size $nd \times nd$ for $n$ observations in $d$ dimensions. Naïvely multiplying with (resp. inverting) these matrices requires $\mathcal{O}(n^2d^2)$ (resp. $\mathcal{O}(n^3d^3$)) operations, which becomes infeasible for moderate dimensions and sample sizes. Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact $\mathcal{O}(n^2d)$ matrix-vector multiply for gradient observations and $\mathcal{O}(n^2d^2)$ for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel classes, which constitutes a structure-aware automatic differentiation algorithm. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels without any additional derivations, enabling flexible, problem-dependent modeling while scaling first-order BO to high $d$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题