数据驱动的编程反馈系统的多准则评估：准确性，有效性，谬误和学生的回应

论文标题

数据驱动的编程反馈系统的多准则评估：准确性，有效性，谬误和学生的回应

A Multicriteria Evaluation for Data-Driven Programming Feedback Systems: Accuracy, Effectiveness, Fallibility, and Students' Response

论文作者

Shabrina, Preya, Marwan, Samiha, Bennison, Andrew, Chi, Min, Price, Thomas, Barnes, Tiffany

论文摘要

数据驱动的编程反馈系统可以帮助新手在没有人类辅导员的情况下进行编程。事先评估表明，这些系统可以从考试成绩或任务完成效率方面改善学习。但是，在这些评估中忽略了可能影响学习或揭示对未来改善的重要见解的关键方面。这些方面包括当前最新的违规性，学生的编程行为，以响应纠正/不正确的反馈以及有效/无效的系统组件。因此，关于此类系统的知识尚未发现。在本文中，我们对集成在基于块的新手编程环境中的数据驱动的反馈系统上应用了5个标准的多标准评估。评估中的每个标准都揭示了系统的独特关键方面：1）反馈系统的准确性； 2）如何指导学生在整个编程任务中； 3）如何帮助学生完成任务完成； 4）当出现问题时会发生什么； 5）学生对系统的一般反应。我们的评估结果表明，由于学生有效的设计和反馈表示，该系统对学生有帮助。但是，由于高度依赖和缺乏自我评估，这种违规性可能会对新手产生负面影响。负面影响包括增加的工作时间，实施或提交错误/部分正确的解决方案。评估结果加强了多标准系统评估的必要性，同时揭示了重要的见解，有助于确保正确使用数据驱动的反馈系统，设计缓解谬误性步骤以及驱动研究以进行未来改进。

Data-driven programming feedback systems can help novices to program in the absence of a human tutor. Prior evaluations showed that these systems improve learning in terms of test scores, or task completion efficiency. However, crucial aspects which can impact learning or reveal insights important for future improvement of such systems are ignored in these evaluations. These aspects include inherent fallibility of current state-of-the-art, students' programming behavior in response to correct/incorrect feedback, and effective/ineffective system components. Consequently, a great deal of knowledge is yet to be discovered about such systems. In this paper, we apply a multi-criteria evaluation with 5 criteria on a data-driven feedback system integrated within a block-based novice programming environment. Each criterion in the evaluation reveals a unique pivotal aspect of the system: 1) How accurate the feedback system is; 2) How it guides students throughout programming tasks; 3) How it helps students in task completion; 4) What happens when it goes wrong; and 5) How students respond generally to the system. Our evaluation results showed that the system was helpful to students due to its effective design and feedback representation despite being fallible. However, novices can be negatively impacted by this fallibility due to high reliance and lack of self-evaluation. The negative impacts include increased working time, implementation, or submission of incorrect/partially correct solutions. The evaluation results reinforced the necessity of multi-criteria system evaluations while revealing important insights helpful to ensuring proper usage of data-driven feedback systems, designing fallibility mitigation steps, and driving research for future improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题