论文标题
遗憾低的多工具强盗的有效算法
An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret
论文作者
论文摘要
最近,提出了经典多军强盗的多代理变体来解决在线学习中的公平问题。受社会选择和经济学的长期工作的启发,目标是优化NASH的社会福利,而不是全面的效用。不幸的是,先前的算法不是有效的,要么就回合$ t $的数量而言是次级遗憾的。我们提出了一种新的有效算法,其遗憾还低于以前的效率低下的算法。对于$ n $ agents,$ k $臂和$ t $ rounds,我们的方法遗憾的是$ \ tilde {o}(\ sqrt {nkt} + nk)$。这是对先前方法的改进,后者遗憾的是$ \ tilde {o}(\ min(nk,\ sqrt {n} k^{3/2})\ sqrt {t})$。我们还使用$ \ tilde {o}(\ sqrt {kt} + n^2k)$遗憾的方法与效率低下的方法补充了我们的有效算法。与以前的方法相比,实验发现证实了我们有效算法的有效性。
Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare instead of the total utility. Unfortunately previous algorithms either are not efficient or achieve sub-optimal regret in terms of the number of rounds $T$. We propose a new efficient algorithm with lower regret than even previous inefficient ones. For $N$ agents, $K$ arms, and $T$ rounds, our approach has a regret bound of $\tilde{O}(\sqrt{NKT} + NK)$. This is an improvement to the previous approach, which has regret bound of $\tilde{O}( \min(NK, \sqrt{N} K^{3/2})\sqrt{T})$. We also complement our efficient algorithm with an inefficient approach with $\tilde{O}(\sqrt{KT} + N^2K)$ regret. The experimental findings confirm the effectiveness of our efficient algorithm compared to the previous approaches.