论文标题
从语音到语音翻译到自动配音
From Speech-to-Speech Translation to Automatic Dubbing
论文作者
论文摘要
我们为语音到语音翻译管道提供了增强功能,以便进行自动配音。我们的体系结构具有神经机器翻译,产生了首选长度的输出,翻译与原始语音段的韵律对齐,神经文本到语音的语音,并微调每种话语的持续时间,最后,音频渲染以丰富文本对语音对语的输出,并从原始音频中提取了背景噪声和回顾。我们报告了对从英语到意大利语的TED演讲摘录的自动配音的主观评估,该演讲衡量了自动配音的自然性以及每个提出的增强的相对重要性。
We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing. Our architecture features neural machine translation generating output of preferred length, prosodic alignment of the translation with the original speech segments, neural text-to-speech with fine tuning of the duration of each utterance, and, finally, audio rendering to enriches text-to-speech output with background noise and reverberation extracted from the original audio. We report on a subjective evaluation of automatic dubbing of excerpts of TED Talks from English into Italian, which measures the perceived naturalness of automatic dubbing and the relative importance of each proposed enhancement.