论文标题
合作半伴侣的有效算法
An Efficient Algorithm for Cooperative Semi-Bandits
论文作者
论文摘要
我们考虑在通信代理网络上进行异步在线组合优化的问题。在每个时间步骤中,某些代理被随机激活,要求进行预测,并且系统支付相应的损失。然后,活跃代理商的邻居会收到半一样的反馈,并交换一些简洁的本地信息。目的是最大程度地减少网络遗憾,该遗憾定义为从组合决策集中选择的活跃代理预测的累积损失和事后观看最佳动作的累积损失之间的差异。在这种情况下,主要的挑战是控制所得算法的计算复杂性,同时保留最小值最佳遗憾保证。我们介绍了Coop-FTPL,这是众所周知的关注领导者算法的合作版本,该算法实现了新的损失估算程序,将NEU和BART {ó} K [2013]的几何再采样概括为我们的设置。假设决策集的元素是k维二进制向量,最多具有M non-Zero条目和$α$ 1的元素是网络的独立性数字,我们表明,T时间步骤后我们算法的预期遗憾是Q Mkt log(k $α$ 1 /Q + M),Q $α$ 1 /Q + M),其中Q是总计量的总计激活量质量。此外,我们证明,这仅是$ \ sqrt $ k log k-away,而最佳可实现的速率则是最先进的t 3/2最糟糕的计算复杂性。
We consider the problem of asynchronous online combinatorial optimization on a network of communicating agents. At each time step, some of the agents are stochastically activated, requested to make a prediction, and the system pays the corresponding loss. Then, neighbors of active agents receive semi-bandit feedback and exchange some succinct local information. The goal is to minimize the network regret, defined as the difference between the cumulative loss of the predictions of active agents and that of the best action in hindsight, selected from a combinatorial decision set. The main challenge in such a context is to control the computational complexity of the resulting algorithm while retaining minimax optimal regret guarantees. We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm, that implements a new loss estimation procedure generalizing the Geometric Resampling of Neu and Bart{ó}k [2013] to our setting. Assuming that the elements of the decision set are k-dimensional binary vectors with at most m non-zero entries and $α$ 1 is the independence number of the network, we show that the expected regret of our algorithm after T time steps is of order Q mkT log(k)(k$α$ 1 /Q + m), where Q is the total activation probability mass. Furthermore, we prove that this is only $\sqrt$ k log k-away from the best achievable rate and that Coop-FTPL has a state-of-the-art T 3/2 worst-case computational complexity.