论文标题

强大的多力多军匪徒

Robust Multi-Agent Multi-Armed Bandits

论文作者

Vial, Daniel, Shakkottai, Sanjay, Srikant, R.

论文摘要

最近的作品表明,面临随机$ k $武装的强盗的独立实例的特工可以合作减少遗憾。但是,这些作品假定每个代理商总是向其他代理商建议其各自的最佳武器估计,这在设想的应用程序中是不现实的(社交推荐系统中分布式计算或垃圾邮件中的机器故障)。因此,我们将其概括为包括$ n $诚实和分别推荐最佳武器估算和任意武器的$ n $ hext剂和$ m $的恶意代理商。我们首先表明,即使有一个恶意代理,现有的基于协作的算法也无法改善单个基线的遗憾保证。我们提出了一个计划,诚实的代理商了解谁是恶意的,并动态地减少与他们的沟通(即“阻止”)。我们表明,合作确实减少了该算法的遗憾,假设$ m $与$ k $相比很小,但没有对恶意代理商的行为进行假设,从而确保我们的算法对任何恶意建议策略都有强大的态度。

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include $n$ honest and $m$ malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源