论文标题
在简单的反应性神经网络上,用于基于行为的强化学习
On Simple Reactive Neural Networks for Behaviour-Based Reinforcement Learning
论文作者
论文摘要
我们提出了一种基于行为的增强学习方法,灵感来自布鲁克的填充体系结构,其中简单的完全连接的网络被训练为反应性行为。我们的工作假设是,可以通过利用机器人开发人员的域知识来分解和训练这种反应性行为来简化挑选和放置机器人任务。即,接近,掌握和缩回。然后,机器人自主学习如何通过演员批评的结构组合它们。参与者批评的政策是确定特定时间序列中反应性行为的激活和抑制机制。我们在模拟的机器人环境中验证了我们的方法,该机器人环境正在选择一个块并将其带到目标位置,同时从顶部掌握方向将抓手取向。后者代表了当前端到端强化学习无法概括的额外的自由度。我们的发现表明,如果孤立地学习每个行为,然后将它们合并以完成任务,则机器人学习可以更有效。也就是说,我们的方法在8,000集中学习了挑选和放置任务,这代表了端到端方法和现有的最新算法所需的训练情节数量的急剧减少。
We present a behaviour-based reinforcement learning approach, inspired by Brook's subsumption architecture, in which simple fully connected networks are trained as reactive behaviours. Our working assumption is that a pick and place robotic task can be simplified by leveraging domain knowledge of a robotics developer to decompose and train such reactive behaviours; namely, approach, grasp, and retract. Then the robot autonomously learns how to combine them via an Actor-Critic architecture. The Actor-Critic policy is to determine the activation and inhibition mechanisms of the reactive behaviours in a particular temporal sequence. We validate our approach in a simulated robot environment where the task is picking a block and taking it to a target position while orienting the gripper from a top grasp. The latter represents an extra degree-of-freedom of which current end-to-end reinforcement learning fail to generalise. Our findings suggest that robotic learning can be more effective if each behaviour is learnt in isolation and then combined them to accomplish the task. That is, our approach learns the pick and place task in 8,000 episodes, which represents a drastic reduction in the number of training episodes required by an end-to-end approach and the existing state-of-the-art algorithms.