沟通有效的内核上下文匪徒的分布式学习

论文标题

沟通有效的内核上下文匪徒的分布式学习

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

论文作者

Li, Chuanhao, Wang, Huazheng, Wang, Mengdi, Wang, Hongning

论文摘要

我们应对在分布式环境中学习内核上下文匪徒的沟通效率挑战。尽管最近的沟通效率分布式强盗学习取得了进步，但现有的解决方案仅限于简单的模型，例如多臂匪徒和线性匪徒，这阻碍了其实用性。在本文中，我们没有假设存在从功能到预期奖励的线性奖励映射，而是通过让代理在复制的内核希尔伯特（RKHS）中进行协作来考虑非线性奖励映射。由于分布式内核学习需要传输原始数据，因此引入了沟通效率的重大挑战，从而导致沟通成本增长，使W.R.T.时间范围$ t $。我们通过装备所有代理通过常见的NyStröm嵌入进行通信来解决此问题，该嵌入过程会随着更多的数据点的收集而自适应地进行更新。我们严格地证明，我们的算法可以以遗憾和沟通成本达到次线性率。

We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting. Despite the recent advances in communication-efficient distributed bandit learning, existing solutions are restricted to simple models like multi-armed bandits and linear bandits, which hamper their practical utility. In this paper, instead of assuming the existence of a linear reward mapping from the features to the expected rewards, we consider non-linear reward mappings, by letting agents collaboratively search in a reproducing kernel Hilbert space (RKHS). This introduces significant challenges in communication efficiency as distributed kernel learning requires the transfer of raw data, leading to a communication cost that grows linearly w.r.t. time horizon $T$. We addresses this issue by equipping all agents to communicate via a common Nyström embedding that gets updated adaptively as more data points are collected. We rigorously proved that our algorithm can attain sub-linear rate in both regret and communication cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题