用于代码开关自动语音识别的基准测试评估指标

论文标题

用于代码开关自动语音识别的基准测试评估指标

Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

论文作者

Hamed, Injy, Hussein, Amir, Chellah, Oumnia, Chowdhury, Shammur, Mubarak, Hamdy, Sitaram, Sunayana, Habash, Nizar, Ali, Ahmed

论文摘要

代码切换带来了多种语言自动语音识别的许多挑战和机会。在本文中，我们专注于鲁棒和公平评估指标的问题。为此，我们开发了具有人类判断的代码转换语音识别假设的参考基准数据集。我们定义了清晰的指南，以最少自动假设的编辑。我们使用4条通道间协议验证指南。我们在与人类判断的相关性方面评估了大量指标。我们考虑的指标在表示方面有所不同（拼字，语音，语义），直接性（内在性与外在），粒度（例如单词，字符）和相似性计算方法。与人类判断的最高相关性是使用音译随后进行文本归一化实现的。我们发布了第一个接受人类接受代码转换语音识别的语料库，导致阿拉伯语/英语对话演讲。

Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minimal editing of automatic hypotheses. We validate the guidelines using 4-way inter-annotator agreement. We evaluate a large number of metrics in terms of correlation with human judgments. The metrics we consider vary in terms of representation (orthographic, phonological, semantic), directness (intrinsic vs extrinsic), granularity (e.g. word, character), and similarity computation method. The highest correlation to human judgment is achieved using transliteration followed by text normalization. We release the first corpus for human acceptance of code-switching speech recognition results in dialectal Arabic/English conversation speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题