关于高斯过程匪徒的信息增益和后悔界限

论文标题

关于高斯过程匪徒的信息增益和后悔界限

On Information Gain and Regret Bounds in Gaussian Process Bandits

论文作者

Vakili, Sattar, Khezeli, Kia, Picheny, Victor

论文摘要

考虑从嘈杂的反馈中评估昂贵的评估和可能非凸的目标函数$ f $的顺序优化，这可以被视为连续武装的强盗问题。关于多种学习算法（GP-UCB，GP-TS及其变体）的遗憾表现的上限在贝叶斯人（当$ f $是高斯流程（GP）的样本（GP）和常见者（当$ f $中时（$ f $）时（$ f $时）（当$ f $都生活在繁殖的Kernel Hilbert空间中时）。遗憾的界限通常依赖于$ t $观察和基础GP（代理）模型之间的最大信息获得$γ_T$。我们根据GP内核的特征值的衰减率在$γ_T$上提供一般界限，GP内核的专业化是在$γ_T$上改善现有界限，随后在许多设置下依赖$γ_T$。对于Matérn的内核家族而言，众所周知，在$γ_T$上的下限$γ_T$，并在频繁的环境下感到遗憾，我们的结果在上限和下限之间的$ t $差距中占据了巨大的多项式（$ t $ toggarithmic in $ t $因素）。

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $γ_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $γ_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $γ_T$, and subsequently the regret bounds relying on $γ_T$ under numerous settings. For the Matérn family of kernels, where the lower bounds on $γ_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).

下载PDF全文

下载文献需遵守相关版权规定

论文标题