记忆有限的一般图表中的协作学习：复杂性，可靠性和可靠性

论文标题

记忆有限的一般图表中的协作学习：复杂性，可靠性和可靠性

Collaborative Learning in General Graphs with Limited Memorization: Complexity, Learnability, and Reliability

论文作者

Li, Feng, Yuan, Xuyang, Wang, Lina, Yang, Huan, Yu, Dongxiao, Lv, Weifeng, Cheng, Xiuzhen

论文摘要

我们在一般图中考虑了一个由代理人任意连接的一般图表中的K臂匪徒问题，并且它们每个都具有有限的记忆能力和通信带宽。目标是让每个代理商最终学习最好的手臂。在这些研究中假定通信图应是完整或结构良好的，而这种假设在实践中并不总是有效的。此外，由于代理商记住和交流很少的经验，因此记忆和沟通带宽也限制了代理商的合作。此外，代理商可能会被损坏以与同龄人分享伪造的经验，而在记忆和沟通方面的资源限制可能会大大限制学习过程的可靠性。为了解决上述问题，我们提出了三阶的协作学习算法。在每个步骤中，代理商通过一般通信图中的轻重量随机步行彼此分享他们的最新经历，然后根据从同龄人那里收到的建议做出决定要拉动哪种武器。代理商最终根据通过拉动武器获得的奖励来更新收养（即对武器的偏好）。我们的理论分析表明，当有足够数量的代理参与协作学习过程时，所有代理人最终都会以很高的可能性学习最佳的手臂，即使记忆力有限和轻巧的交流。我们还在理论分析中揭示了对算法可以忍受的损坏代理数量的上限。我们提出的三阶段协作学习算法的功效最终通过对合成和真实数据集的广泛实验进行了验证。

We consider a K-armed bandit problem in general graphs where agents are arbitrarily connected and each of them has limited memorizing capabilities and communication bandwidth. The goal is to let each of the agents eventually learn the best arm. It is assumed in these studies that the communication graph should be complete or well-structured, whereas such an assumption is not always valid in practice. Furthermore, limited memorization and communication bandwidth also restrict the collaborations of the agents, since the agents memorize and communicate very few experiences. Additionally, an agent may be corrupted to share falsified experiences to its peers, while the resource limit in terms of memorization and communication may considerably restrict the reliability of the learning process. To address the above issues, we propose a three-staged collaborative learning algorithm. In each step, the agents share their latest experiences with each other through light-weight random walks in a general communication graph, and then make decisions on which arms to pull according to the recommendations received from their peers. The agents finally update their adoptions (i.e., preferences to the arms) based on the reward obtained by pulling the arms. Our theoretical analysis shows that, when there are a sufficient number of agents participating in the collaborative learning process, all the agents eventually learn the best arm with high probability, even with limited memorizing capabilities and light-weight communications. We also reveal in our theoretical analysis the upper bound on the number of corrupted agents our algorithm can tolerate. The efficacy of our proposed three-staged collaborative learning algorithm is finally verified by extensive experiments on both synthetic and real datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题