通过逆增强学习为人类学习的奖励成型

论文标题

通过逆增强学习为人类学习的奖励成型

Reward Shaping for Human Learning via Inverse Reinforcement Learning

论文作者

Rucker, Mark A., Watson, Layne T., Gerber, Matthew S., Barnes, Laura E.

论文摘要

人类是壮观的增强学习者，不断学习并适应经验和反馈。不幸的是，这并不一定意味着人类是快速学习者。当任务具有挑战性时，学习可能会变得不可接受。幸运的是，人类不必学习Tabula rasa，并且学习辅助工具可以大大提高学习速度。在这项工作中，我们验证了一种新型的学习辅助工具 - 通过逆增强学习（IRL）为人类的奖励。这种援助的目的是提高人类为特定任务学习良好政策的速度。此外，这种方法补充了替代机器学习技术，例如安全功能，这些技术试图防止个人做出错误的决定。为了实现我们的结果，我们首先通过内核方法扩展了众所周知的IRL算法。之后，我们使用在线游戏进行了两个人类主题实验，在线游戏中，玩家的时间有限，可以学习良好的政策。我们表明，获得我们学习援助的玩家能够比对照组更快地了解所需的政策。

Humans are spectacular reinforcement learners, constantly learning from and adjusting to experience and feedback. Unfortunately, this doesn't necessarily mean humans are fast learners. When tasks are challenging, learning can become unacceptably slow. Fortunately, humans do not have to learn tabula rasa, and learning speed can be greatly increased with learning aids. In this work we validate a new type of learning aid -- reward shaping for humans via inverse reinforcement learning (IRL). The goal of this aid is to increase the speed with which humans can learn good policies for specific tasks. Furthermore this approach compliments alternative machine learning techniques such as safety features that try to prevent individuals from making poor decisions. To achieve our results we first extend a well known IRL algorithm via kernel methods. Afterwards we conduct two human subjects experiments using an online game where players have limited time to learn a good policy. We show with statistical significance that players who receive our learning aid are able to approach desired policies more quickly than the control group.

下载PDF全文

下载文献需遵守相关版权规定

论文标题