论文标题

建立非洲声音

Building African Voices

论文作者

Ogayo, Perez, Neubig, Graham, Black, Alan W

论文摘要

如果有足够的高质量数据和计算资源,现代语音合成技术可以产生自然的语音。但是,许多语言不容易获得此类数据。本文着重于低资源非洲语言的语音综合,从语料库创建到共享和部署文本到语音(TTS)系统。我们首先为具有最低技术资源和主题专业知识的构建语音合成系统创建了一组通用说明。接下来,我们通过参与式方法从“发现”数据(现有记录)中创建新的数据集,并考虑可访问性,质量和广度。我们证明,即使在次优环境中记录下来,我们也可以开发具有25分钟创建的语音的合成器,这些合成器也可以产生可理解的语音。最后,我们发布了12种非洲语言的语音数据,代码和训练的声音,以支持研究人员和开发人员。

Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources and subject-matter expertise. Next, we create new datasets and curate datasets from "found" data (existing recordings) through a participatory approach while considering accessibility, quality, and breadth. We demonstrate that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. Finally, we release the speech data, code, and trained voices for 12 African languages to support researchers and developers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源