通过锚定参考样品通过序数回归改善发音评估

论文标题

通过锚定参考样品通过序数回归改善发音评估

Improving pronunciation assessment via ordinal regression with anchored reference samples

论文作者

Su, Bin, Mao, Shaoguang, Soong, Frank, Xia, Yan, Tien, Jonathan, Wu, Zhiyong

论文摘要

句子级发音评估对于计算机辅助语言学习（Call）很重要。基于发音良好（GOP）算法的传统语音发音评估在评估语音话语方面有一定的弱点：1）音素共和党分数无法轻松地转化为句子评分，并以简单的平均评估来进行有效评估； 2）在共和党评分中尚未很好地利用排序顺序信息，以提供强大的评估并与人类评估者的评估良好相关。在本文中，我们提出了两个新的统计特征，即平均GOP（AGOP）和混乱共和党（CGOP），并使用它们用锚定参考样本（ORAR）（ORARS）训练序数回归中的二进制分类器。当在Microsoft MTUTOR ESL数据集上测试所提出的方法时，对基于GOP的GOP的相对相关系数的相对改善将获得26.9％。表现在人类准则水平或比人类评估者更好。

Sentence level pronunciation assessment is important for Computer Assisted Language Learning (CALL). Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations. In this paper, we propose two new statistical features, average GOP (aGOP) and confusion GOP (cGOP) and use them to train a binary classifier in Ordinal Regression with Anchored Reference Samples (ORARS). When the proposed approach is tested on Microsoft mTutor ESL Dataset, a relative improvement of Pearson correlation coefficient of 26.9% is obtained over the conventional GOP-based one. The performance is at a human-parity level or better than human raters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题