论文标题

基于CEFR的句子难度注释和评估

CEFR-Based Sentence Difficulty Annotation and Assessment

论文作者

Arase, Yuki, Uchida, Satoru, Kajiwara, Tomoyuki

论文摘要

可控文本简化是语言学习和教学的至关重要的辅助技术。阻碍其进步的主要因素之一是缺乏基于语言能力描述的句子难度水平注释的语料库。为了解决这个问题,我们创建了基于CEFR的句子概况(CEFR-SP)语料库,其中包含17K英语句子,并根据英语教育专业人士分配的语言的共同参考框架注释了级别。此外,我们提出了一个句子级评估模型来处理不平衡的水平分布,因为最基本和高度熟练的句子自然是稀缺的。在这项研究的实验中,我们的方法在评估水平评估中达到了84.5%的宏F1分数,因此优于可读性评估中使用的强基础。

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源