随机线性上下文匪徒，具有不同的上下文

论文标题

随机线性上下文匪徒，具有不同的上下文

Stochastic Linear Contextual Bandits with Diverse Contexts

论文作者

Wu, Weiqiang, Yang, Jing, Shen, Cong

论文摘要

在本文中，我们研究了情境多样性对随机线性上下文匪徒的影响。与先前的观点相比，上下文导致了更加困难的匪徒学习，我们表明，当上下文变得足够多样化时，学习者能够利用剥削期间获得的信息来缩短探索过程，从而减少了遗憾。我们设计了Linucb-D算法，并提出了一种新颖的方法来分析其遗憾表现。主要的理论结果是，在不同的上下文假设下，Linucb-D的累积预期遗憾是由常数界定的。作为副产品，我们的结果提高了对Linucb的先前理解，并增强了其性能保证。

In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits. As opposed to the previous view that contexts lead to more difficult bandit learning, we show that when the contexts are sufficiently diverse, the learner is able to utilize the information obtained during exploitation to shorten the exploration process, thus achieving reduced regret. We design the LinUCB-d algorithm, and propose a novel approach to analyze its regret performance. The main theoretical result is that under the diverse context assumption, the cumulative expected regret of LinUCB-d is bounded by a constant. As a by-product, our results improve the previous understanding of LinUCB and strengthen its performance guarantee.

下载PDF全文

下载文献需遵守相关版权规定

论文标题