通过动态的确定性马尔可夫决策过程，陈述的定价消失了，遗憾

论文标题

通过动态的确定性马尔可夫决策过程，陈述的定价消失了，遗憾

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

论文作者

Emek, Yuval, Lavi, Ron, Niazadeh, Rad, Shi, Yangguang

论文摘要

在本文中，在发布价格机制的领域中引入和研究了一个相当普遍的在线问题，称为动态资源分配（DRACC）。这个问题包含了州定价的几个应用程序，包括但不限于在线工作计划的发布价格以及在动态的两部分图上进行匹配。由于现有的在线学习技术不会为此问题产生消失的regret机制，因此我们开发了一个新颖的在线学习框架，该框架定义了Markov决策过程，并具有动态的状态过渡和奖励功能。然后，我们证明，如果保证马尔可夫决策过程承认可以从任何具有有限损失的初始状态模拟任何给定策略的甲骨文（在DRACC问题中满足的条件），那么在线学习问题就可以通过消失的遗憾来解决。我们的证明技术是基于切换成本减少在线学习的，在线决策者每次从一只手臂切换到另一只手臂时都会增加额外的成本。我们正式证明了这种联系，并进一步展示了如何在我们提出的状态定价应用中使用DRACC。

In this paper, a rather general online problem called dynamic resource allocation with capacity constraints (DRACC) is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. As the existing online learning techniques do not yield vanishing-regret mechanisms for this problem, we develop a novel online learning framework defined over deterministic Markov decision processes with dynamic state transition and reward functions. We then prove that if the Markov decision process is guaranteed to admit an oracle that can simulate any given policy from any initial state with bounded loss -- a condition that is satisfied in the DRACC problem -- then the online learning problem can be solved with vanishing regret. Our proof technique is based on a reduction to online learning with switching cost, in which an online decision maker incurs an extra cost every time she switches from one arm to another. We formally demonstrate this connection and further show how DRACC can be used in our proposed applications of stateful pricing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题