论文标题

端到端文本到语音模型的知识能否改善神经中部对原告的合成系统?

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

论文作者

Shi, Xuan, Cooper, Erica, Wang, Xin, Yamagishi, Junichi, Narayanan, Shrikanth

论文摘要

随着符号输入的音乐和语音综合之间的相似性以及文本到语音(TTS)技术的快速发展,值得探索通过从TTS技术借入TTS技术来改善Midi-to-Audio性能的方法。在这项研究中,我们分析了基于TTS的Midi-to-Audio系统的缺点,并在功能计算,模型选择和培训策略方面进行了改进,旨在综合高度自然的音频。此外,我们通过听力测试,音高测量和光谱图分析进行了广泛的模型评估。这项工作不仅展示了高度自然音乐的综合,而且为社区提供了彻底的分析方法和有用的结果。我们的代码,预培训的模型,补充材料和音频样本在https://github.com/nii-yamagishilab/midi-to-audio上开放。

With the similarity between music and speech synthesis from symbolic input and the rapid development of text-to-speech (TTS) techniques, it is worthwhile to explore ways to improve the MIDI-to-audio performance by borrowing from TTS techniques. In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and training strategy, aiming to synthesize highly natural-sounding audio. Moreover, we conducted an extensive model evaluation through listening tests, pitch measurement, and spectrogram analysis. This work demonstrates not only synthesis of highly natural music but offers a thorough analytical approach and useful outcomes for the community. Our code, pre-trained models, supplementary materials, and audio samples are open sourced at https://github.com/nii-yamagishilab/midi-to-audio.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源