使用强化学习进行开放式对话中的动态计划

论文标题

使用强化学习进行开放式对话中的动态计划

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

论文作者

Cohen, Deborah, Ryu, Moonkyung, Chow, Yinlam, Keller, Orgad, Greenberg, Ido, Hassidim, Avinatan, Fink, Michael, Matias, Yossi, Szpektor, Idan, Boutilier, Craig, Elidan, Gal

论文摘要

尽管自然语言理解和产生的最新进展以及关于对话机器人发展的数十年研究，但可以建立可以与人类进行丰富的开放式对话的自动化代理，这仍然是一个巨大的挑战。在这项工作中，我们开发了一种实时的开放式对话系统，该对话系统使用强化学习（RL）来掌握机器人的对话技巧。我们的工作将使用SOTA（监督）语言模型产生的对话状态的简洁嵌入使用RL技术，这些技术特别适合随着对话的进行而变化的动态动作空间。经过人群数据培训，我们的新型系统能够在与Google Assistant的真实用户进行实时实验中，可以实质上超过（强）基线监督模型。

Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题