论文标题
有效计划限制的有效算法
Efficient Algorithms for Planning with Participation Constraints
论文作者
论文摘要
我们考虑[Zhang等,2022]中引入的参与限制的计划问题。在这个问题中,校长在马尔可夫决策过程中选择了行动,从而为委托人和代理人提供了单独的公用事业。但是,代理可以并且将选择在其预期的效用变为负面时结束该过程。委托人试图在代理人应该始终希望继续参与的限制下计算并承诺最大化其预期效用的政策。我们为有限 - 霍森设置的此问题提供了第一个多项式时间精确算法,以前仅知道添加剂$ \ varepsilon $ -Approximation算法。我们的方法也可以扩展到(打折的)无限 - 摩恩盒,为此,我们提供了一种算法,该算法在输入的大小和$ \ log(1/\ varepsilon)$的时间内运行,并返回一项最佳的策略,该策略是$ \ varepsilon $的附加误差。
We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022]. In this problem, a principal chooses actions in a Markov decision process, resulting in separate utilities for the principal and the agent. However, the agent can and will choose to end the process whenever his expected onward utility becomes negative. The principal seeks to compute and commit to a policy that maximizes her expected utility, under the constraint that the agent should always want to continue participating. We provide the first polynomial-time exact algorithm for this problem for finite-horizon settings, where previously only an additive $\varepsilon$-approximation algorithm was known. Our approach can also be extended to the (discounted) infinite-horizon case, for which we give an algorithm that runs in time polynomial in the size of the input and $\log(1/\varepsilon)$, and returns a policy that is optimal up to an additive error of $\varepsilon$.