论文标题
实时策略游戏集成终身加强学习代理的系统设计
System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games
论文作者
论文摘要
随着人造和机器人系统越来越多地部署并依赖于现实世界中的应用,重要的是,它们具有在动态变化的环境中不断学习和适应并成为终身学习机器的能力。持续/终身学习(LL)涉及最大程度地减少对旧任务的灾难性忘记,同时最大程度地提高模型学习新任务的能力。本文介绍了具有挑战性的终身增强学习(L2RL)设置。在L2RL中推动最新的前进并使L2RL对实际应用有用,而不仅仅是开发单个L2RL算法的还需要更多。它需要在系统级别上取得进展,尤其是对如何将多个L2RL算法整合到共同框架中的非平凡问题的研究。在本文中,我们介绍了终身加强学习组件框架(L2RLCF),该框架标准化了L2RL系统,并将不同的持续学习组件(每个解决终身学习问题的不同方面)吸收到统一系统中。作为L2RLCF的实例化,我们开发了标准的API,可以轻松整合新颖的终身学习组件。我们描述了一个案例研究,该案例研究表明如何将多个独立开发的LL组件集成到单个实现的系统中。我们还引入了一个评估环境,以衡量组合各种系统组件的效果。我们的评估环境采用了由Starcraft-2迷你游戏组成的不同的LL方案(任务序列),并允许在具有挑战性的常见评估环境中对组件不同组合的公平,全面和定量的比较。
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.