论文标题
通过JSA学习离散潜在变量模型,推进半监督的面向任务的对话系统
Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models
论文作者
论文摘要
通过利用未标记的对话框数据来开发半监督的面向任务的对话框(TOD)系统,吸引了越来越多的兴趣。对于对潜在状态TOD模型的半监督学习,经常使用变分学习,但遭受了通过离散潜在变量传播的梯度的烦人的高度变化,并且间接优化了目标对数的偏见。最近,一种称为关节随机近似(JSA)的替代算法已出现,用于学习具有令人印象深刻的性能的离散潜在可变模型。在本文中,我们建议将JSA应用于对潜在状态TOD模型的半监督学习,该模型称为JSA-TOD。据我们所知,JSA-TOD代表了开发基于JSA的半监督学习的第一批工作,用于为TOD系统(例如TOD系统)等长期顺序产生问题的离散潜在可变条件模型学习。广泛的实验表明,JSA-TOD明显优于其变异学习对应物。值得注意的是,使用20%标签的半监督JSA-TOD在Multiwoz2.1上的全面监督基线表现近。
Developing semi-supervised task-oriented dialog (TOD) systems by leveraging unlabeled dialog data has attracted increasing interests. For semi-supervised learning of latent state TOD models, variational learning is often used, but suffers from the annoying high-variance of the gradients propagated through discrete latent variables and the drawback of indirectly optimizing the target log-likelihood. Recently, an alternative algorithm, called joint stochastic approximation (JSA), has emerged for learning discrete latent variable models with impressive performances. In this paper, we propose to apply JSA to semi-supervised learning of the latent state TOD models, which is referred to as JSA-TOD. To our knowledge, JSA-TOD represents the first work in developing JSA based semi-supervised learning of discrete latent variable conditional models for such long sequential generation problems like in TOD systems. Extensive experiments show that JSA-TOD significantly outperforms its variational learning counterpart. Remarkably, semi-supervised JSA-TOD using 20% labels performs close to the full-supervised baseline on MultiWOZ2.1.