论文标题
违反扬声器适应的违反违反语音重建的说话者身份保存
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
论文作者
论文摘要
旨在提高违规言语质量的违反语音重建(DSR)仍然是一个挑战,这不仅是因为我们需要恢复语音为正常,而且还必须保留说话者的身份。已经探索了由针对说话者验证的说话者编码器(SE)提取的说话者表示,以控制说话者的身份。但是,SE可能无法完全捕获以前看不见的符号扬声器的特征。为了解决这一研究问题,我们提出了一种新颖的多任务学习策略,即对对抗者的适应性(ASA)。 ASA微调SE的主要任务是通过目标违规扬声器的语音有效捕获与身份相关的信息的,而次要任务应用对抗性训练,以避免将异常的口语模式纳入重建的语音中,并通过将统治性语音的分布与高质量的参考相关。实验表明,通过强大的基线方法,提出的方法可以实现增强的说话者的相似性和可比的语音自然性。与命运语音相比,重建的语音分别为中度和中度构音障碍的说话者分别达到22.3%和31.5%的绝对单词错误率。我们的演示页面在此处发布:https://wendison.github.io/asa-dsr-demo/
Dysarthric speech reconstruction (DSR), which aims to improve the quality of dysarthric speech, remains a challenge, not only because we need to restore the speech to be normal, but also must preserve the speaker's identity. The speaker representation extracted by the speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. However, the SE may not be able to fully capture the characteristics of dysarthric speakers that are previously unseen. To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality. Experiments show that the proposed approach can achieve enhanced speaker similarity and comparable speech naturalness with a strong baseline approach. Compared with dysarthric speech, the reconstructed speech achieves 22.3% and 31.5% absolute word error rate reduction for speakers with moderate and moderate-severe dysarthria respectively. Our demo page is released here: https://wendison.github.io/ASA-DSR-demo/