XCS分类器系统具有经验重播

论文标题

XCS分类器系统具有经验重播

XCS Classifier System with Experience Replay

论文作者

Stein, Anthony, Maier, Roland, Rosenbauer, Lukas, Hähner, Jörg

论文摘要

XC构成了当今最深入研究的分类器系统。它具有强大的潜力，并具有掌握各种不同学习任务的固有功能。除了在各种分类和回归任务方面取得的杰出成功外，XC在来自强化学习领域的某些多步环境中也非常有效。尤其是在后一个领域，最近的进步主要是由算法驱动的，这些算法基于深层神经网络进行了建模 - 其中深Q-Network（DQN）是杰出的代表。经验重播（ER）构成了DQN成功的关键因素之一，因为它促进了基于神经网络的Q功能近似概念剂的稳定训练。令人惊讶的是，XC几乎没有利用相似的机制来利用到目前为止遇到的存储原始体验。为了弥合这一差距，本文研究了用ER扩展XC的好处。一方面，我们证明，对于单步任务，ER具有巨大的样本效率来改善的潜力。然而，在阴暗的一面，我们揭示了ER的使用可能会进一步加剧XC的良好研究，当应用于要求长期链的顺序决策问题时，XC的使用尚未解决。

XCS constitutes the most deeply investigated classifier system today. It bears strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks -- among which the Deep-Q-Network (DQN) is a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN's successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage stored raw experiences encountered so far. To bridge this gap, this paper investigates the benefits of extending XCS with ER. On the one hand, we demonstrate that for single-step tasks ER bears massive potential for improvements in terms of sample efficiency. On the shady side, however, we reveal that the use of ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题