语音MOS多任务学习和评估者偏见更正

论文标题

语音MOS多任务学习和评估者偏见更正

Speech MOS multi-task learning and rater bias correction

论文作者

Akrami, Haleh, Gamper, Hannes

论文摘要

感知语音质量是针对电信应用程序的重要性能指标。平均意见评分（MOS）是标准化语音质量评估的标准化，并通过要求听众对语音样本的质量进行评分而获得。最近，在开发盲目估算模型中的研究兴趣越来越大。在这里，我们提出了一个多任务框架，以在培训中包括其他标签和数据，以提高盲目MOS估计模型的性能。实验结果表明，通过将两个不连接数据集组合在训练中，一个仅包含MOS标签，另一个仅包含T60和C50标签，可以训练所提出的模型以共同估计MOS，混响时间（T60）和Clarity（C50）（C50）。此外，我们使用半监督的框架将两个MOS数据集结合在训练中，一个仅包含MOS标签（根据ITU-T建议P.808），另一个包含单独的语音信号，背景噪声和整体质量的单独分数（根据ITU-T推荐P.835）。最后，我们提出了解决MOS标签中单个评估者偏置的初步结果。

Perceptual speech quality is an important performance metric for teleconferencing applications. The mean opinion score (MOS) is standardized for the perceptual evaluation of speech quality and is obtained by asking listeners to rate the quality of a speech sample. Recently, there has been increasing research interest in developing models for estimating MOS blindly. Here we propose a multi-task framework to include additional labels and data in training to improve the performance of a blind MOS estimation model. Experimental results indicate that the proposed model can be trained to jointly estimate MOS, reverberation time (T60), and clarity (C50) by combining two disjoint data sets in training, one containing only MOS labels and the other containing only T60 and C50 labels. Furthermore, we use a semi-supervised framework to combine two MOS data sets in training, one containing only MOS labels (per ITU-T Recommendation P.808), and the other containing separate scores for speech signal, background noise, and overall quality (per ITU-T Recommendation P.835). Finally, we present preliminary results for addressing individual rater bias in the MOS labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题