论文标题

Ildae:实例级别的评估数据难度分析

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

论文作者

Varshney, Neeraj, Mishra, Swaroop, Baral, Chitta

论文摘要

知识的难度水平的知识可以通过多种方式帮助老师,例如通过提出精心选择的问题并通过修改琐碎和艰难的问题来快速估算学生的潜力。我们可以在NLP中提取实例难度的这些好处吗?为此,我们在23个数据集的大规模设置中对评估数据(ILDAE)进行实例级别的难度分析,并证明了其五个新颖的应用:1)进行有效的实例评估,并在节省计算成本和时间较少的情况下进行较少的实例评估,2)基于对现有的评估数据集的改进,并选择了triv yriv noff inif nofe offing offerial offerians offing offing nofe offing offing nofe thement,3),3),3),3)指导未来数据创建的特征,5)可靠地估算室外性能。这些应用程序的全面实验会导致一些有趣的发现,例如,仅使用5%实例(通过ILDAE选择)进行评估,使用完整的数据集与评估相关性高达0.93 Kendall相关性,并使用完整的数据集进行了使用难度分数计算加权准确性,导致与超层次绩效相关性5.2%。我们发布了难度分数,并希望我们的分析和发现将更多地关注这个重要但经过研究的杠杆化领域,从而使实例难以评估。

Knowledge of questions' difficulty level helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in NLP? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these applications result in several interesting findings, such as evaluation using just 5% instances (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2% higher correlation with Out-of-Domain performance. We release the difficulty scores and hope our analyses and findings will bring more attention to this important yet understudied field of leveraging instance difficulty in evaluations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源