论文标题

不变的Lipschitz强盗:侧观测方法

Invariant Lipschitz Bandits: A Side Observation Approach

论文作者

Tran, Nam Phuong, Tran-Thanh, Long

论文摘要

对称性出现在许多优化和决策问题中,并引起了优化社区的极大关注:通过利用这种对称性的存在,可以显着改善寻找最佳解决方案的过程。尽管在(离线)优化方面取得了成功,但在线优化设置中,尤其是在强盗文献中,对对称性的利用尚未得到很好的研究。因此,在本文中,我们研究了不变的Lipschitz Bandit设置,这是Lipschitz Bandits的子类,其中奖励功能和一组武器都保留在一组转换下。我们介绍了一种名为\ texttt {rormiformess-n}的算法,该算法自然地将使用组轨道的侧面观测集成到\ texttt {uniformmesh}算法中(\ cite {kleinberg2005_unibermesh}),从而均匀地分散了一组武器的设置。鉴于该组是有限的,使用侧观测方法,我们证明了改善的遗憾上限,这取决于该组的基数。我们还证明了与不变的Lipschitz Bandit班级的匹配后悔的下限(达到对数因素)。我们希望我们的工作将在匪徒理论和一般决策理论中对对称性进行进一步研究。

Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源