学习在模型预测控制中优化

论文标题

学习在模型预测控制中优化

Learning to Optimize in Model Predictive Control

论文作者

Sacks, Jacob, Boots, Byron

论文摘要

基于抽样的模型预测控制（MPC）是一个灵活的控制框架，可以推理非平滑动力学和成本功能。最近，经常通过学习或微调动态或成本功能来提高MPC的性能来提高MPC的性能。相比之下，我们专注于学习更有效地优化。换句话说，要改善MPC中的更新规则。我们表明，这在基于抽样的MPC中特别有用，在基于抽样的MPC中，我们通常希望出于计算原因最大程度地减少样品数量。不幸的是，计算效率的成本是绩效的降低。较少的样本导致嘈杂的更新。我们证明，我们可以通过学习如何更有效地更新控制分布并更好地利用我们拥有的少数样本来抗衡这种噪音。通过模仿学习，我们学到的控制器经过培训，以模仿一个可以访问更多样本的专家。我们测试了方法对样品受限型制度中多个模拟机器人技术任务的功效，并证明我们的方法可以胜过具有相同数量的样品的MPC控制器。

Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions. Recently, significant work has focused on the use of machine learning to improve the performance of MPC, often through learning or fine-tuning the dynamics or cost function. In contrast, we focus on learning to optimize more effectively. In other words, to improve the update rule within MPC. We show that this can be particularly useful in sampling-based MPC, where we often wish to minimize the number of samples for computational reasons. Unfortunately, the cost of computational efficiency is a reduction in performance; fewer samples results in noisier updates. We show that we can contend with this noise by learning how to update the control distribution more effectively and make better use of the few samples that we have. Our learned controllers are trained via imitation learning to mimic an expert which has access to substantially more samples. We test the efficacy of our approach on multiple simulated robotics tasks in sample-constrained regimes and demonstrate that our approach can outperform a MPC controller with the same number of samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题