端到端的命名实体识别来自英语演讲

论文标题

端到端的命名实体识别来自英语演讲

End-to-end Named Entity Recognition from English Speech

论文作者

Yadav, Hemant, Ghosh, Sreyan, Yu, Yi, Shah, Rajiv Ratn

论文摘要

从文本中命名的实体识别（NER）是一个广泛研究的问题，通常从文本中提取语义信息。到目前为止，来自语音的NER主要是在两步管道过程中研究的，该过程包括在音频样本上首先应用自动语音识别（ASR）系统，然后将预测的成绩单传递给NER标记器。在这种情况下，由于两个任务均未以端到端（E2E）方式优化，因此错误不会从一个步骤转到另一个步骤。最近的研究证实，综合方法（例如E2E ASR）优于顺序的方法（例如，基于音素的ASR）。在本文中，我们引入了第一个公开可用的注释数据集用于英语语音，并提出了E2E方法，该方法共同优化了ASR和NER标记组件。实验结果表明，所提出的E2E方法的表现优于经典的两步方法。我们还讨论了如何使用语音中的NER来处理ASR系统中的词汇（OOV）单词。

Named entity recognition (NER) from text has been a widely studied problem and usually extracts semantic information from text. Until now, NER from speech is mostly studied in a two-step pipeline process that includes first applying an automatic speech recognition (ASR) system on an audio sample and then passing the predicted transcript to a NER tagger. In such cases, the error does not propagate from one step to another as both the tasks are not optimized in an end-to-end (E2E) fashion. Recent studies confirm that integrated approaches (e.g., E2E ASR) outperform sequential ones (e.g., phoneme based ASR). In this paper, we introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimizes the ASR and NER tagger components. Experimental results show that the proposed E2E approach outperforms the classical two-step approach. We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题