Makerere无线电演讲语料库：卢甘达无线电语料库，用于自动语音识别

论文标题

Makerere无线电演讲语料库：卢甘达无线电语料库，用于自动语音识别

The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition

论文作者

Mukiibi, Jonathan, Katumba, Andrew, Nakatumba-Nabende, Joyce, Hussein, Ali, Meyer, Josh

论文摘要

构建可用的无线电监视自动语音识别（ASR）系统是资源不足的语言的一项具有挑战性的任务，但这在广播是公众沟通和讨论的主要媒介的社会中至关重要。联合国在乌干达的最初努力证明了如何理解被社交媒体排除在社交媒体中的农村人民的看法在国家规划中很重要。但是，由于缺乏转录的语音数据集，这些努力正受到挑战。在本文中，Makerere人工智能研究实验室发布了155小时的Luganda Radio演讲语料库。据我们所知，这是撒哈拉以南非洲第一个公开可用的无线电数据集。本文描述了语音语料库的开发，并使用开源语音识别工具包Coqui STT Toolkit提出了基线Luganda ASR绩效结果。

Building a usable radio monitoring automatic speech recognition (ASR) system is a challenging task for under-resourced languages and yet this is paramount in societies where radio is the main medium of public communication and discussions. Initial efforts by the United Nations in Uganda have proved how understanding the perceptions of rural people who are excluded from social media is important in national planning. However, these efforts are being challenged by the absence of transcribed speech datasets. In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours. To our knowledge, this is the first publicly available radio dataset in sub-Saharan Africa. The paper describes the development of the voice corpus and presents baseline Luganda ASR performance results using Coqui STT toolkit, an open source speech recognition toolkit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题