具有简化的生成体系结构的共同加强用户模拟器和面向任务的对话框系统

论文标题

具有简化的生成体系结构的共同加强用户模拟器和面向任务的对话框系统

Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture

论文作者

Liu, Hong, Ou, Zhijian, Huang, Yi, Feng, Junlan

论文摘要

最近，在验证的GPT-2构建端到端任务导向对话框（TOD）系统方面，已有预计的GPT-2取得了进展。但是，从未探索过基于GPT-2的对话框系统（DS）以及端到端用户模拟器（US）的在线加强学习。此外，现有基于GPT-2的TOD系统的缺点是它们主要使用整个对话记录作为输入，这会使内存和计算中的效率低下。在本文中，我们首先基于GPT-2，但使用历史记录缩短，分别针对DS和我们提出了简化的生成架构（SGA）。然后，我们成功地发展了共同加强我们和DS，称为SGA-Jrud。我们与拟议的SGA的DS，只有受过监督的培训，就可以在Multiwoz2.1上实现最先进的性能，并且在培训和发电方面都更加稳定。多Woz2.1的广泛实验进一步显示了SGA-JRUD在离线和在线评估中的优势。

Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as input, which brings inefficiencies in memory and compute. In this paper, we first propose Simplified Generative Architectures (SGA) for DS and US respectively, both based on GPT-2 but using shortened history. Then, we successfully develop Jointly Reinforced US and DS, called SGA-JRUD. Our DS with the proposed SGA, when only supervised trained, achieves state-of-the-art performance on MultiWOZ2.1 and is more compute-efficient in both training and generation. Extensive experiments on MultiWOZ2.1 further show the superiority of SGA-JRUD in both offline and online evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题