自动短数学答案通过封闭式元学习分级

论文标题

自动短数学答案通过封闭式元学习分级

Automatic Short Math Answer Grading via In-context Meta-learning

论文作者

Zhang, Mengxue, Baral, Sami, Heffernan, Neil, Lan, Andrew

论文摘要

自动简短答案分级是探索如何使用人工智能（AI）基于基于人工智能的工具来改善教育的重要研究方向。当前的最新方法使用神经语言模型来创建学生响应的矢量表示，然后是分类器以预测分数。但是，这些方法有几个关键的局限性，包括i）他们使用的预训练的语言模型不适合教育主题领域和/或学生生成的文本和ii）它们几乎总是每个问题训练一个模型，忽略了问题上的链接，并导致由于高级语言模型的大小而导致重要的模型存储问题。在本文中，我们研究了学生对数学问题的回答的自动简短答案分级问题，并为此任务提出了一个新颖的框架。首先，我们使用Mathbert，Mathbert是流行语言模型BERT的一种变体，该模型适合数学内容，并将其定为基础模型，并将其用于学生响应分级的下游任务。其次，我们使用一种文字学习方法，提供评分示例作为语言模型的输入，以提供其他上下文信息并促进对以前看不见的问题的概括。我们在对开放式数学问题的学生响应的现实数据集上评估了我们的框架，并表明我们的框架（通常非常明显）优于现有方法，尤其是对于培训期间没有看到的新问题。

Automatic short answer grading is an important research direction in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-the-art approaches use neural language models to create vectorized representations of students responses, followed by classifiers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained language models that are not well-adapted to educational subject domains and/or student-generated text and ii) they almost always train one model per question, ignoring the linkage across a question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grading for students' responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it for the downstream task of student response grading. Second, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended math questions and show that our framework (often significantly) outperforms existing approaches, especially for new questions that are not seen during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题