论文标题

分散的多人多军土匪,没有碰撞信息

Decentralized Multi-player Multi-armed Bandits with No Collision Information

论文作者

Shi, Chengshuai, Xiong, Wei, Shen, Cong, Yang, Jing

论文摘要

本文研究了分散的随机多武器多军强盗(MP-MAB)问题,其中没有针对玩家提供碰撞信息。在Boursier and Perchet(2019)的开创性工作的基础上,我们提出了涉及通信(EC-SIC)的错误校正同步(EC-SIC),他们的遗憾表现出与碰撞信息的集中式随机MP-MAB的相关性。通过认识到没有碰撞信息的通信阶段对应于信息理论中的Z渠道模型,提出的EC-SIC算法将最佳的误差校正编码用于奖励统计的通信。固定的消息长度与对数在Boursier和Perchet(2019)中的对数增长相反,在控制沟通损失方面也起着至关重要的作用。实用的Z通道代码(例如重复代码,翻转代码和修改后的锤码)的实验证明了EC-SIC在合成和现实世界数据集中的优越性。

The decentralized stochastic multi-player multi-armed bandit (MP-MAB) problem, where the collision information is not available to the players, is studied in this paper. Building on the seminal work of Boursier and Perchet (2019), we propose error correction synchronization involving communication (EC-SIC), whose regret is shown to approach that of the centralized stochastic MP-MAB with collision information. By recognizing that the communication phase without collision information corresponds to the Z-channel model in information theory, the proposed EC-SIC algorithm applies optimal error correction coding for the communication of reward statistics. A fixed message length, as opposed to the logarithmically growing one in Boursier and Perchet (2019), also plays a crucial role in controlling the communication loss. Experiments with practical Z-channel codes, such as repetition code, flip code and modified Hamming code, demonstrate the superiority of EC-SIC in both synthetic and real-world datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源