主观评分和欺骗语音转换挑战评估的预测2020提交

论文标题

主观评分和欺骗语音转换挑战评估的预测2020提交

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

论文作者

Das, Rohan Kumar, Kinnunen, Tomi, Huang, Wen-Chin, Ling, Zhenhua, Yamagishi, Junichi, Zhao, Yi, Tian, Xiaohai, Toda, Tomoki

论文摘要

语音转换挑战2020是其旗舰店的第三版，它促进了语言内半行和跨语性语音转换（VC）。虽然对挑战提交的主要评估是通过众筹的听力测试进行的，但我们还对已提交的系统进行了客观评估。客观评估的目的是提供互补的绩效分析，比耗时的听力测试可能更有益。在这项研究中，我们使用自动扬声器验证（ASV），神经扬声器嵌入，欺骗对策，预测平均意见分数（MOS）和自动语音识别（ASR）检查了五种类型的客观评估。这些客观措施中的每一个都沿着不同方面评估VC输出。我们观察到，这些客观评估与主观结果的相关性对于ASV，神经扬声器嵌入和ASR的相关性很高，这使它们对预测主观测试结果的影响更大。此外，我们对已提交的系统进行了欺骗评估，并确定了一些VC方法，显示了潜在的安全风险。

The Voice Conversion Challenge 2020 is the third edition under its flagship that promotes intra-lingual semiparallel and cross-lingual voice conversion (VC). While the primary evaluation of the challenge submissions was done through crowd-sourced listening tests, we also performed an objective assessment of the submitted systems. The aim of the objective assessment is to provide complementary performance analysis that may be more beneficial than the time-consuming listening tests. In this study, we examined five types of objective assessments using automatic speaker verification (ASV), neural speaker embeddings, spoofing countermeasures, predicted mean opinion scores (MOS), and automatic speech recognition (ASR). Each of these objective measures assesses the VC output along different aspects. We observed that the correlations of these objective assessments with the subjective results were high for ASV, neural speaker embedding, and ASR, which makes them more influential for predicting subjective test results. In addition, we performed spoofing assessments on the submitted systems and identified some of the VC methods showing a potentially high security risk.

下载PDF全文

下载文献需遵守相关版权规定

论文标题