使用多因素优化同时进化了深厚的增强学习模型

论文标题

使用多因素优化同时进化了深厚的增强学习模型

Simultaneously Evolving Deep Reinforcement Learning Models using Multifactorial Optimization

论文作者

Martinez, Aritz D., Osaba, Eneko, Del Ser, Javier, Herrera, Francisco

论文摘要

近年来，多因素优化（MFO）在研究界获得了显着的势头。 MFO以其固有的能力同时有效地解决多个优化任务，同时将信息传输到此类任务之间以提高其收敛速度。另一方面，机器学习领域深Q学习（DQL）实现的量子飞跃使面对史无前例的复杂性的加强学习（RL）问题。不幸的是，由于缺乏探索或稀疏的奖励，复杂的DQL模型通常发现很难融入最佳政策。为了克服这些缺点，预先训练的模型通过转移学习广泛利用，将源任务中获得的知识推断为目标任务。此外，已经证明荟萃分析优化可减少对DQL模型的缺乏。这项工作提出了一个MFO框架，能够同时发展几个DQL模型来解决相互关联的RL任务。具体而言，我们提出的框架将元位式优化，转移学习和DQL的好处融合在一起，以自动化分布式RL代理的知识转移和政策学习过程。提出和讨论了一个彻底的实验，以评估框架的性能，与传统的转移学习方法的比较，以收敛性，速度和政策质量以及在搜索过程中发现和利用的插入关系。

In recent years, Multifactorial Optimization (MFO) has gained a notable momentum in the research community. MFO is known for its inherent capability to efficiently address multiple optimization tasks at the same time, while transferring information among such tasks to improve their convergence speed. On the other hand, the quantum leap made by Deep Q Learning (DQL) in the Machine Learning field has allowed facing Reinforcement Learning (RL) problems of unprecedented complexity. Unfortunately, complex DQL models usually find it difficult to converge to optimal policies due to the lack of exploration or sparse rewards. In order to overcome these drawbacks, pre-trained models are widely harnessed via Transfer Learning, extrapolating knowledge acquired in a source task to the target task. Besides, meta-heuristic optimization has been shown to reduce the lack of exploration of DQL models. This work proposes a MFO framework capable of simultaneously evolving several DQL models towards solving interrelated RL tasks. Specifically, our proposed framework blends together the benefits of meta-heuristic optimization, Transfer Learning and DQL to automate the process of knowledge transfer and policy learning of distributed RL agents. A thorough experimentation is presented and discussed so as to assess the performance of the framework, its comparison to the traditional methodology for Transfer Learning in terms of convergence, speed and policy quality , and the intertask relationships found and exploited over the search process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题