部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

论文作者

Qu, Xiaoye, Zeng, Jun, Liu, Daizong, Wang, Zhefeng, Huai, Baoxing, Zhou, Pan

论文摘要

远程监督的命名实体识别（DS-NER）可以通过自动生成培训样本有效地减轻NER中的数据稀缺问题。不幸的是，遥远的监督可能会引起嘈杂的标签，从而破坏了学习模型的鲁棒性并限制了实际应用。为了减轻这个问题，最近的作品采用自训练的教师框架来逐步完善培训标签并提高NER模型的概括能力。但是，我们认为，当前为DS-NER的自我训练框架的表现严重低估了他们的简单设计，包括学生学习不足和更粗糙的老师更新。因此，在本文中，我们首次尝试通过提出：（1）由两个教师学习网络的联合培训组成的自适应教师学习，并考虑了两个教师之间的一致和不一致的预测，从而促进了全面的学生学习。（2）细化的学生合奏，以更新教师模型的每个片段，并具有学生相应片段的时间移动平均值，从而增强了每个模型片段对噪声的一致预测。为了验证我们提出的方法的有效性，我们对四个DS-NER数据集进行了实验。实验结果表明，我们的方法显着超过了先前的SOTA方法。

Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student frameworks to gradually refine the training labels and improve the generalization ability of NER models. However, we argue that the performance of the current self-training frameworks for DS-NER is severely underestimated by their plain designs, including both inadequate student learning and coarse-grained teacher updating. Therefore, in this paper, we make the first attempt to alleviate these issues by proposing: (1) adaptive teacher learning comprised of joint training of two teacher-student networks and considering both consistent and inconsistent predictions between two teachers, thus promoting comprehensive student learning. (2) fine-grained student ensemble that updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise. To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题