通过马尔可夫链中的实验设计进行主动探索

论文标题

通过马尔可夫链中的实验设计进行主动探索

Active Exploration via Experiment Design in Markov Chains

论文作者

Mutný, Mojmír, Janik, Tadeusz, Krause, Andreas

论文摘要

科学和工程学的一个主要挑战是设计实验，以了解一些未知数的兴趣。经典的实验设计最佳地分配了实验预算，以最大程度地提高效用概念（例如，降低了对未知数量的不确定性）。我们考虑一个丰富的设置，其中实验与{\ em Markov链}中的状态相关联，我们只能通过选择控制状态转换的{\ em策略}来选择它们。该问题捕获了重要的应用程序，从加强学习中的探索到空间监视任务。我们提出了一种算法 - \ textsc {markov-design} - 有效地选择了其测量分配\ emph {可证明会收敛到最佳One}的策略。该算法本质上是顺序的，可以调整其过去测量所告知的策略（实验）的选择。除了理论分析外，我们还展示了我们在生态监测和药理学中应用的框架。

A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm -- \textsc{markov-design} -- that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题