使用转移学习和拓扑数据分析的对话术语提取

论文标题

使用转移学习和拓扑数据分析的对话术语提取

Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

论文作者

Vukovic, Renato, Heck, Michael, Ruppik, Benjamin Matthias, van Niekerk, Carel, Zibrowius, Marcus, Gašić, Milica

论文摘要

面向目标的对话系统最初是作为自然语言接口设计的，用于用户可能会询问域，插槽和值进一步描述的实体的固定数据集。随着我们朝着适应性的对话系统迈进，有关域，插槽和值的知识可能会发生变化，因此越来越需要从原始对话或相关的非拨号数据中自动提取这些术语。在本文中，我们通过探索可以使系统能够以纯粹的数据驱动方式在对话中发现对话的实现，朝着这个方向迈出重要一步。我们检查的功能源于单词嵌入，语言建模功能以及嵌入空间一词的拓扑特征。为了检查每个功能集的效用，我们基于广泛使用的多沃兹数据集训练种子模型。然后，我们将此模型应用于其他语料库，即模式引导的对话数据集。我们的方法的表现优于仅依赖单词嵌入的先前提出的方法。我们还证明，每个功能都负责发现各种内容。我们认为，我们的结果需要进一步研究本体诱导，并继续利用对话和自然语言处理研究的拓扑数据分析。

Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots, and values. As we move towards adaptable dialogue systems where knowledge about domains, slots, and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related non-dialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Then, we apply this model to a different corpus, the Schema-Guided Dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题