对随机线性匪徒的遗憾，并带有重尾收益

论文标题

对随机线性匪徒的遗憾，并带有重尾收益

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

论文作者

Xue, Bo, Wang, Guanghui, Wang, Yimu, Zhang, Lijun

论文摘要

在本文中，我们研究了具有有限作用集的随机线性匪徒的问题。大多数现有工作都假定收益是有限的或次高斯的，这在某些情况下可能会违反金融市场。为了解决这个问题，我们通过重尾收益分析了线性土匪，其中的回报允许有限的$ 1+ε$ iments in（0,1] $。通过平均值和动态截断的中位数，我们提出了两种新颖的算法，这些算法享受着sublinear遗憾的结束。 $ \ widetilde {o}（d^{\ frac {1} {2}} t^{\ frac {\ frac {1} {1+ε}} $ $Ω(d^{\fracε{1+ε}}T^{\frac{1}{1+ε}})$ lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of $d$ and $T$ when $ε=1$. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the经验结果强烈支持我们的理论保证。

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+ε$ moments for some $ε\in(0,1]$. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{1+ε}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Meanwhile, we provide an $Ω(d^{\fracε{1+ε}}T^{\frac{1}{1+ε}})$ lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of $d$ and $T$ when $ε=1$. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题