论文标题
线性二次自适应控制的精确渐近学
Exact Asymptotics for Linear Quadratic Adaptive Control
论文作者
论文摘要
强化学习的最新进展导致了一系列应用程序的出色表现,但是它在高风险环境中的部署仍然很少见。原因之一是对强化算法的行为有限的理解,无论是从他们的遗憾和学习基础系统动态的能力方面 - 现有工作几乎完全集中在表征速率上,而很少关注常数乘以那些在实践中至关重要的速率。为了开始应对这一挑战,我们研究了最简单的非伴侣增强学习问题:线性二次自适应控制(LQAC)。通过仔细将LQAC问题的最新有限样本性能界与特定的(较少)的Martingale Central Limit定理相结合,我们能够得出渐近表达式的表达式,以获得遗憾,估计误差和预测级别的速率级别级别级别级别的误差。在对稳定和不稳定系统的模拟中,我们发现我们的渐近理论还很好地描述了该算法的有限样本行为。
Recent progress in reinforcement learning has led to remarkable performance in a range of applications, but its deployment in high-stakes settings remains quite rare. One reason is a limited understanding of the behavior of reinforcement algorithms, both in terms of their regret and their ability to learn the underlying system dynamics---existing work is focused almost exclusively on characterizing rates, with little attention paid to the constants multiplying those rates that can be critically important in practice. To start to address this challenge, we study perhaps the simplest non-bandit reinforcement learning problem: linear quadratic adaptive control (LQAC). By carefully combining recent finite-sample performance bounds for the LQAC problem with a particular (less-recent) martingale central limit theorem, we are able to derive asymptotically-exact expressions for the regret, estimation error, and prediction error of a rate-optimal stepwise-updating LQAC algorithm. In simulations on both stable and unstable systems, we find that our asymptotic theory also describes the algorithm's finite-sample behavior remarkably well.