随机网络实用程序最大化，未知公用事业：多臂匪徒方法

论文标题

随机网络实用程序最大化，未知公用事业：多臂匪徒方法

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

论文作者

Verma, Arun, Hanawal, Manjesh K.

论文摘要

在本文中，我们研究了一个新型的随机网络实用性最大化（NUM）问题，其中剂的实用性未知。每个代理的效用取决于它从网络运营商/控制器中收到的资源量。操作员希望进行资源分配，以最大化网络的预期总实用性。我们考虑阈值类型实用程序功能，如果其收到的资源量高于某个阈值，则每个代理都会获得非零实用程序。否则，其效用为零（实时硬实时）。我们将这种NUM设置与未知公用事业相同，作为一个遗憾的最小化问题。我们的目标是确定作为“良好”的策略，作为知道代理商实用程序的甲骨文政策。我们将此问题设置建模为强盗设置，其中每个回合中获得的反馈取决于分配给代理的资源。我们使用多部多臂匪徒和组合半伴侣的想法为这种新颖的设置提出了算法。我们表明，当所有试剂具有相同的实用程序时，提出的算法都是最佳的。我们通过数值实验来验证我们提出的算法的性能保证。

In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题