论文标题
改进的合成训练以阅读理解
Improved Synthetic Training for Reading Comprehension
论文作者
论文摘要
已显示自动生成的合成训练示例可提高机器阅读理解的性能(MRC)。与人类注释的黄金标准数据相比,合成训练数据具有独特的特性,例如以质量为代价的高可用性。鉴于这种差异,在本文中,我们探讨了合成示例在MRC中的新应用。我们提出的培训和知识蒸馏策略对现有方法显示出显着改善。在一个特别令人惊讶的发现中,我们观察到合成蒸馏通常会产生能够胜过教师模型的学生。
Automatically generated synthetic training examples have been shown to improve performance in machine reading comprehension (MRC). Compared to human annotated gold standard data, synthetic training data has unique properties, such as high availability at the possible expense of quality. In view of such differences, in this paper, we explore novel applications of synthetic examples to MRC. Our proposed pre-training and knowledge distillation strategies show significant improvements over existing methods. In a particularly surprising discovery, we observe that synthetic distillation often yields students that can outperform the teacher model.