错误信息的困惑很高

论文标题

错误信息的困惑很高

Misinformation Has High Perplexity

论文作者

Lee, Nayeon, Bang, Yejin, Madotto, Andrea, Fung, Pascale

论文摘要

揭穿错误信息是一项重要且至关重要的任务，因为当未迅速撤消错误信息时可能会产生不利的后果。但是，通过误导性分类进行揭穿的通常监督方法需要人类宣传的数据，并且不适合新出现的事件的快速时间，例如Covid-19-19-19。在本文中，我们假设与真实的陈述相比，错误信息本身具有更大的困惑，并建议以无监督的方式利用困惑来揭开虚假主张。首先，我们根据与索赔的判决相似，从科学和新闻来源中提取可靠的证据。其次，我们将一种具有提取证据的语言模型为基础，并最终根据揭穿时间的困惑得分评估给定主张的正确性。我们构建了两个新的COVID-19与19号相关的测试集，一个是科学的，另一个是政治内容，并从经验上验证了我们的系统与现有系统相比的性能是否有利。我们正在公开发布这些数据集，以鼓励更多的研究，以揭露Covid-19和其他主题的错误信息。

Debunking misinformation is an important and time-critical task as there could be adverse consequences when misinformation is not quashed promptly. However, the usual supervised approach to debunking via misinformation classification requires human-annotated data and is not suited to the fast time-frame of newly emerging events such as the COVID-19 outbreak. In this paper, we postulate that misinformation itself has higher perplexity compared to truthful statements, and propose to leverage the perplexity to debunk false claims in an unsupervised manner. First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims. Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time. We construct two new COVID-19-related test sets, one is scientific, and another is political in content, and empirically verify that our system performs favorably compared to existing systems. We are releasing these datasets publicly to encourage more research in debunking misinformation on COVID-19 and other topics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题