通过动作嵌入对多域对话策略的强化学习

论文标题

通过动作嵌入对多域对话策略的强化学习

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

论文作者

Mendez, Jorge A., Geramifard, Alborz, Ghavamzadeh, Mohammad, Liu, Bing

论文摘要

通过加强学习学习面向任务的对话策略通常需要与用户进行大量互动，实际上，这些方法对于现实世界应用程序无法使用。为了减少数据要求，我们建议从不同的对话框域中利用数据，从而减少每个给定域所需的数据量。特别是，我们建议学习域 - 不可静力的动作嵌入，它们捕获通用的结构，该结构为当前的对话框上下文提供了为系统提供信息，然后专门针对特定的域。我们展示了这种方法如何能够与用户的互动少得多，而学习所需的对话数量减少了35％，并且比培训一组模拟域上的每个域的单独策略要比培训单独的策略较高。

Learning task-oriented dialog policies via reinforcement learning typically requires large amounts of interaction with users, which in practice renders such methods unusable for real-world applications. In order to reduce the data requirements, we propose to leverage data from across different dialog domains, thereby reducing the amount of data required from each given domain. In particular, we propose to learn domain-agnostic action embeddings, which capture general-purpose structure that informs the system how to act given the current dialog context, and are then specialized to a specific domain. We show how this approach is capable of learning with significantly less interaction with users, with a reduction of 35% in the number of dialogs required to learn, and to a higher level of proficiency than training separate policies for each domain on a set of simulated domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题