论文标题
组合多弹药问题及其在能源管理中的应用
The Combinatorial Multi-Bandit Problem and its Application to Energy Management
论文作者
论文摘要
我们研究了由能源系统管理中的应用激发的组合多伴侣问题。给定多个具有未知结果分布的概率的多臂匪徒,任务是优化组合物镜函数的值,将单个匪徒结果的向量映射到单个标量奖励。与多维动作空间的单伴侣问题不同,在我们的环境中可以观察到各个土匪的结果,并且已知目标函数。在个人可观察性能够在探索和剥削之间进行更好的权衡的假设的指导下,我们将较低的遗憾推广到单个土匪中,这表明确实,对于多种土匪,它可以接受并行化的探索。对于我们的能源管理应用程序,我们提出了一系列算法,这些算法将多臂匪徒的勘探原理与数学编程结合在一起。在一项实验研究中,我们证明了我们在365集的地平线内学习150个土匪的方法的有效性。
We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management. Given multiple probabilistic multi-arm bandits with unknown outcome distributions, the task is to optimize the value of a combinatorial objective function mapping the vector of individual bandit outcomes to a single scalar reward. Unlike in single-bandit problems with multi-dimensional action space, the outcomes of the individual bandits are observable in our setting and the objective function is known. Guided by the hypothesis that individual observability enables better trade-offs between exploration and exploitation, we generalize the lower regret bound for single bandits, showing that indeed for multiple bandits it admits parallelized exploration. For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming. In an experimental study we demonstrate the effectiveness of our approach to learn action assignments for 150 bandits, each having 24 actions, within a horizon of 365 episodes.