系统识别中的强化学习

论文标题

系统识别中的强化学习

Reinforcement Learning in System Identification

论文作者

H., Jose Antonio Martin, Vicente, Oscar Fernandez, Perez, Sergio, Belfadil, Anas, Ibanez-Llano, Cristina, Rondon, Freddy Jose Perozo, Valle, Jose Javier, Pelaz, Javier Arechalde

论文摘要

系统识别，也称为学习前进模型，传输功能，系统动力等，在不同领域的科学和工程方面具有悠久的传统。特别是，这是强化学习研究中的一个反复出现的主题，在该研究中，通过学习从当前状态和行动到下一个状态的映射函数来近似马尔可夫决策过程的状态过渡函数。这个问题通常以直接方式定义为监督学习问题。由于动态的固有复杂性，例如学习延迟效果，高非线性，非平稳性，部分可观察性以及更重要的是，在使用自举预测（基于过去预测的预测）时，这种常见方法面临几个困难。在这里，我们探讨了在此问题中使用强化学习的使用。我们详细介绍了为什么和如何自然地拟合和听起来作为增强学习问题，并提出了一些实验结果，这些结果证明RL是解决此类问题的有前途的技术。

System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题