通过大脑启发的调制进行增强学习可以改善对环境变化的适应

论文标题

通过大脑启发的调制进行增强学习可以改善对环境变化的适应

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

论文作者

Chalmers, Eric, Luczak, Artur

论文摘要

增强学习的发展（RL）允许算法在高度复杂但在很大程度上静态问题中实现令人印象深刻的性能。相反，生物学似乎重视适应不断变化的世界的效率。在这里，我们建立在最近提供的神经元学习规则的基础上，该规则假设每个神经元可以通过预测其自己的未来活动来优化其能量平衡。该假设导致了使用突触前输入来调节预测误差的神经元学习规则。我们认为，类似的RL规则将使用动作概率调节奖励预测错误。这种调制使代理人对负面体验更敏感，并在形成偏好方面更加谨慎。我们将所提出的规则嵌入了表格和深Q-Network RL算法中，并发现它在简单但高度动态的任务中优于常规算法。我们建议新规则封装了生物智能的核心原则。允许算法以类似人类的方式改变算法的重要组成部分。

Developments in reinforcement learning (RL) have allowed algorithms to achieve impressive performance in highly complex, but largely static problems. In contrast, biological learning seems to value efficiency of adaptation to a constantly-changing world. Here we build on a recently-proposed neuronal learning rule that assumes each neuron can optimize its energy balance by predicting its own future activity. That assumption leads to a neuronal learning rule that uses presynaptic input to modulate prediction error. We argue that an analogous RL rule would use action probability to modulate reward prediction error. This modulation makes the agent more sensitive to negative experiences, and more careful in forming preferences. We embed the proposed rule in both tabular and deep-Q-network RL algorithms, and find that it outperforms conventional algorithms in simple, but highly-dynamic tasks. We suggest that the new rule encapsulates a core principle of biological intelligence; an important component for allowing algorithms to adapt to change in a human-like way.

下载PDF全文

下载文献需遵守相关版权规定

论文标题