论文标题
JVS音乐:日本多钟唱歌声音语料库
JVS-MuSiC: Japanese multispeaker singing-voice corpus
论文作者
论文摘要
得益于机器学习技术的发展,可以合成单个歌手的高质量歌声。开放的多言式歌唱语料库将进一步加速唱歌综合的研究。但是,传统的唱歌声乐语料库仅由单个歌手的歌声组成。我们设计了一个名为“ JVS-Music”的日本多言式演唱语料库,目的是分析和综合各种声音。该语料库由100首同一首歌Katatsumuri的歌手录音组成,这是日本儿童歌曲。它还包括另一首对每位歌手不同的歌曲。在本文中,我们使用JVS-Music描述了语料库的设计和实验分析。我们调查了1)唱歌声音的相似性与统一歌声的知觉统一性以及2)唱歌声音的相似性和言语的相似之处。结果表明,1)歌唱声音相似性与一致性之间存在正相关,而2)唱歌声音相似性和语音相似性之间的相关性很弱。该语料库可以在线免费获得。
Thanks to developments in machine learning techniques, it has become possible to synthesize high-quality singing voices of a single singer. An open multispeaker singing-voice corpus would further accelerate the research in singing-voice synthesis. However, conventional singing-voice corpora only consist of the singing voices of a single singer. We designed a Japanese multispeaker singing-voice corpus called "JVS-MuSiC" with the aim to analyze and synthesize a variety of voices. The corpus consists of 100 singers' recordings of the same song, Katatsumuri, which is a Japanese children's song. It also includes another song that is different for each singer. In this paper, we describe the design of the corpus and experimental analyses using JVS-MuSiC. We investigated the relationship between 1) the similarity of singing voices and perceptual oneness of unison singing voices and between 2) the similarity of singing voices and that of speech. The results suggest that 1) there is a positive and moderate correlation between singing-voice similarity and the oneness of unison and that 2) the correlation between singing-voice similarity and speech similarity is weak. This corpus is freely available online.