联合在线匪徒

论文标题

联合在线匪徒

Federated Online Clustering of Bandits

论文作者

Liu, Xutong, Zhao, Haoru, Yu, Tong, Li, Shuai, Lui, John C. S.

论文摘要

上下文多臂强盗（MAB）是推荐系统中重要的顺序决策问题。一系列称为土匪（俱乐部）聚类的作品，利用了对用户的协作效果，并显着提高了建议质量。由于应用程序规模不断提高和对隐私的公众担忧，因此需求不断增长，以使用户数据分散并将匪徒学习推向本地服务器端。但是，现有的俱乐部算法是在集中设置下设计的，该设置在中央服务器上提供数据。我们专注于研究Bandit（FCLUB）问题的联合在线聚类，该问题旨在最大程度地减少遗憾，同时满足隐私和沟通的考虑。我们为群集检测设计了一种新的基于阶段的方案，并为解决此问题的合作强盗学习而设计了一种新型的异步通信协议。为了保护用户的隐私，以前的差异隐私（DP）定义不是很合适，我们提出了一个在用户群集级别上起作用的新DP概念。我们提供了严格的证据，以证明我们的算法同时实现了（聚类）DP，sublrinear沟通复杂性和sublritear遗憾。最后，实验评估表明，与基准算法相比，我们的表现出色。

Contextual multi-armed bandit (MAB) is an important sequential decision-making problem in recommendation systems. A line of works, called the clustering of bandits (CLUB), utilize the collaborative effect over users and dramatically improve the recommendation quality. Owing to the increasing application scale and public concerns about privacy, there is a growing demand to keep user data decentralized and push bandit learning to the local server side. Existing CLUB algorithms, however, are designed under the centralized setting where data are available at a central server. We focus on studying the federated online clustering of bandit (FCLUB) problem, which aims to minimize the total regret while satisfying privacy and communication considerations. We design a new phase-based scheme for cluster detection and a novel asynchronous communication protocol for cooperative bandit learning for this problem. To protect users' privacy, previous differential privacy (DP) definitions are not very suitable, and we propose a new DP notion that acts on the user cluster level. We provide rigorous proofs to show that our algorithm simultaneously achieves (clustered) DP, sublinear communication complexity and sublinear regret. Finally, experimental evaluations show our superior performance compared with benchmark algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题