论文标题

部分可观测时空混沌系统的无模型预测

Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations

论文作者

Hsu, Chin-Cheng

论文摘要

我们将非语音发声(NSV)建模作为文本到语音任务,并验证其生存能力。具体而言,我们评估了Hubert语音单元在NSV上的语音表达性,并验证了模型控制扬声器音色的能力,即使训练数据是扬声器很少的。此外,我们证实了记录条件中的异质性是NSV建模的主要障碍。最后,我们讨论了对未来研究方法的五个改进。合成NSV的音频样本可在我们的演示页面上提供:https://resemble-ai.github.io/relaugh。

We formulated non-speech vocalization (NSV) modeling as a text-to-speech task and verified its viability. Specifically, we evaluated the phonetic expressivity of HUBERT speech units on NSVs and verified our model's ability to control over speaker timbre even though the training data is speaker few-shot. In addition, we substantiated that the heterogeneity in recording conditions is the major obstacle for NSV modeling. Finally, we discussed five improvements over our method for future research. Audio samples of synthesized NSVs are available on our demo page: https://resemble-ai.github.io/reLaugh.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源