论文标题
Space-3:统一的对话框模型预训练,以实现任务的对话框理解和生成
SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation
论文作者
论文摘要
最近,预训练方法在以任务为导向的对话框(TOD)系统中表现出显着的成功。但是,大多数现有的预培训模型用于TOD专注于对话的理解或对话生成,但并非两者兼而有之。在本文中,我们提出了Space-3,这是一种新颖的统一半监督的预训练的预训练的对话模型,从大规模对话CORPORA中学习有限的注释,可以有效地对广泛的下游对话任务进行微调。 Specifically, SPACE-3 consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation产生适当响应的模块。我们为每个组件设计一个专用的预训练目标。具体而言,我们预先培训对话框编码模块,使用跨度掩码语言建模来学习上下文化的对话框信息。为了捕获“结构化对话”语义,我们通过额外的对话框注释来预先培训对话框理解模块。此外,我们通过将其输出策略向量与响应响应的语义向量之间的L2距离最小化以进行策略优化,从而预先培训对话框策略模块。最后,对话框生成模型是通过语言建模预先训练的。结果表明,Space-3在八个下游对话框基准中实现了最新性能,包括意图预测,对话框状态跟踪和端到端对话框建模。我们还表明,在低资源设置下,Space-3比现有模型具有更强的射击能力。
Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either dialog understanding or dialog generation, but not both. In this paper, we propose SPACE-3, a novel unified semi-supervised pre-trained conversation model learning from large-scale dialog corpora with limited annotations, which can be effectively fine-tuned on a wide range of downstream dialog tasks. Specifically, SPACE-3 consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation module to produce appropriate responses. We design a dedicated pre-training objective for each component. Concretely, we pre-train the dialog encoding module with span mask language modeling to learn contextualized dialog information. To capture the structured dialog semantics, we pre-train the dialog understanding module via a novel tree-induced semi-supervised contrastive learning objective with the help of extra dialog annotations. In addition, we pre-train the dialog policy module by minimizing the L2 distance between its output policy vector and the semantic vector of the response for policy optimization. Finally, the dialog generation model is pre-trained by language modeling. Results show that SPACE-3 achieves state-of-the-art performance on eight downstream dialog benchmarks, including intent prediction, dialog state tracking, and end-to-end dialog modeling. We also show that SPACE-3 has a stronger few-shot ability than existing models under the low-resource setting.