论文标题
L2使用自我监督的语音表征评估L2的能力评估
L2 proficiency assessment using self-supervised speech representations
论文作者
论文摘要
近年来,对自动口语评估系统的需求不断增长。该过程的标准管道是从语音识别系统开始,并根据手工制作或基于深度学习的特征来利用转录和音频。尽管这些方法可以产生高性能系统,但它们需要可用于L2扬声器的语音识别系统,最好调整为部署的特定测试形式。最近,提出了一个基于不需要语音识别的基于自我监督的语音表示计划。这项工作扩展了对这种方法进行的初始分析,以进行大规模的能力测试,即Linguaskill,该测试包括多个部分,每个部分旨在评估候选人的口语能力的不同属性。将自我监督的WAV2VEC 2.0的性能与高性能手工评估系统和基于BERT的文本系统进行比较。尽管发现基于WAV2VEC 2.0的系统对响应的性质很敏感,但可以将其配置为与需要语音转录的系统产生可比的性能,并在与标准方法正确结合时会产生增长。
There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though these approaches can yield high performance systems, they require speech recognition systems that can be used for L2 speakers, and preferably tuned to the specific form of test being deployed. Recently a self-supervised speech representation based scheme, requiring no speech recognition, was proposed. This work extends the initial analysis conducted on this approach to a large scale proficiency test, Linguaskill, that comprises multiple parts, each designed to assess different attributes of a candidate's speaking proficiency. The performance of the self-supervised, wav2vec 2.0, system is compared to a high performance hand-crafted assessment system and a BERT-based text system both of which use speech transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring a speech transcription, and yields gains when appropriately combined with standard approaches.