端到端文本到语音模型的知识能否改善神经中部对原告的合成系统？

论文标题

端到端文本到语音模型的知识能否改善神经中部对原告的合成系统？

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

论文作者

Shi, Xuan, Cooper, Erica, Wang, Xin, Yamagishi, Junichi, Narayanan, Shrikanth

论文摘要

随着符号输入的音乐和语音综合之间的相似性以及文本到语音（TTS）技术的快速发展，值得探索通过从TTS技术借入TTS技术来改善Midi-to-Audio性能的方法。在这项研究中，我们分析了基于TTS的Midi-to-Audio系统的缺点，并在功能计算，模型选择和培训策略方面进行了改进，旨在综合高度自然的音频。此外，我们通过听力测试，音高测量和光谱图分析进行了广泛的模型评估。这项工作不仅展示了高度自然音乐的综合，而且为社区提供了彻底的分析方法和有用的结果。我们的代码，预培训的模型，补充材料和音频样本在https://github.com/nii-yamagishilab/midi-to-audio上开放。

With the similarity between music and speech synthesis from symbolic input and the rapid development of text-to-speech (TTS) techniques, it is worthwhile to explore ways to improve the MIDI-to-audio performance by borrowing from TTS techniques. In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and training strategy, aiming to synthesize highly natural-sounding audio. Moreover, we conducted an extensive model evaluation through listening tests, pitch measurement, and spectrogram analysis. This work demonstrates not only synthesis of highly natural music but offers a thorough analytical approach and useful outcomes for the community. Our code, pre-trained models, supplementary materials, and audio samples are open sourced at https://github.com/nii-yamagishilab/midi-to-audio.

下载PDF全文

下载文献需遵守相关版权规定

论文标题