概括未来：减轻虚假新闻检测中的实体偏见

论文标题

概括未来：减轻虚假新闻检测中的实体偏见

Generalizing to the Future: Mitigating Entity Bias in Fake News Detection

论文作者

Zhu, Yongchun, Sheng, Qiang, Cao, Juan, Li, Shuokai, Wang, Danding, Zhuang, Fuzhen

论文摘要

假新闻的广泛传播正在越来越威胁到个人和社会。假新闻检测旨在培训过去新闻的模型并检测未来的假新闻。尽管已经做出了巨大的努力，但现有的虚假新闻检测方法忽略了现实世界中意外的实体偏见，这严重影响了模型的通用能力。例如，在我们的数据中，包含“唐纳德·特朗普”（Donald Trump）的实体的2017年有97％的新闻作品是真实的，但是该百分比仅在2018年降至33 \％。这将导致对前者几乎不推广到后者的模型，因为它倾向于预测有关“ Donald Trump”的新闻损失的新闻，因为它会因较低的培训损失。在本文中，我们提出了一个实体偏见框架（\ textbf {endef}），该框架通过从因果效应的角度缓解实体偏见，将伪造新闻检测模型推广到未来数据。基于实体，新闻内容和新闻真实性之间的因果图，我们在培训过程中分别对每个原因（实体和内容）的贡献进行了建模。在推论阶段，我们删除了实体减轻实体偏见的直接效果。英语和中文数据集的广泛离线实验表明，所提出的框架可以在很大程度上提高基本假新闻探测器的性能，并且在线测试在实践中验证了其优势。据我们所知，这是第一项明确提高虚假新闻检测模型对未来数据的概括能力的工作。该代码已在https://github.com/ictmcg/endef-sigir2022上发布。

The wide dissemination of fake news is increasingly threatening both individuals and society. Fake news detection aims to train a model on the past news and detect fake news of the future. Though great efforts have been made, existing fake news detection methods overlooked the unintended entity bias in the real-world data, which seriously influences models' generalization ability to future data. For example, 97\% of news pieces in 2010-2017 containing the entity `Donald Trump' are real in our data, but the percentage falls down to merely 33\% in 2018. This would lead the model trained on the former set to hardly generalize to the latter, as it tends to predict news pieces about `Donald Trump' as real for lower training loss. In this paper, we propose an entity debiasing framework (\textbf{ENDEF}) which generalizes fake news detection models to the future data by mitigating entity bias from a cause-effect perspective. Based on the causal graph among entities, news contents, and news veracity, we separately model the contribution of each cause (entities and contents) during training. In the inference stage, we remove the direct effect of the entities to mitigate entity bias. Extensive offline experiments on the English and Chinese datasets demonstrate that the proposed framework can largely improve the performance of base fake news detectors, and online tests verify its superiority in practice. To the best of our knowledge, this is the first work to explicitly improve the generalization ability of fake news detection models to the future data. The code has been released at https://github.com/ICTMCG/ENDEF-SIGIR2022.

下载PDF全文

下载文献需遵守相关版权规定

论文标题