论文标题
COVID-19的两个阶段变压器模型假新闻检测和事实检查
Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking
论文作者
论文摘要
通过社交媒体平台在线沟通中技术的快速发展导致错误信息和虚假新闻的传播越来越多。假新闻在当前的19日大流行中尤为猖ramp,导致人们相信虚假和潜在的有害主张和故事。迅速发现假新闻可以减轻恐慌,混乱和潜在的健康危害的传播。我们使用自然语言处理的最先进的机器学习模型开发了Covid-19的两级自动化管道,用于Covid-19的假新闻检测。第一个模型利用了一种新颖的事实检查算法,该算法检索了有关用户对特定Covid-19索赔的最相关事实。第二个模型通过计算主张与从手动策划的COVID-19数据集检索到的真实事实之间的文本需要来验证索赔中的真相级别。该数据集基于一个公开可用的知识来源,该资料由5000多个Covid-19的虚假索赔和经过验证的解释组成,其中其中的子集经过内部注释和交叉验证以训练和评估我们的模型。我们评估了基于经典文本特征的一系列模型,以对更多基于上下文的变压器模型进行观察,并观察到基于Bert和Albert的模型管道分别为两个阶段带来了最佳结果。
The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people believing in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health hazards. We developed a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing. The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user claims about particular COVID-19 claims. The second model verifies the level of truth in the claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset. The dataset is based on a publicly available knowledge source consisting of more than 5000 COVID-19 false claims and verified explanations, a subset of which was internally annotated and cross-validated to train and evaluate our models. We evaluate a series of models based on classical text-based features to more contextual Transformer based models and observe that a model pipeline based on BERT and ALBERT for the two stages respectively yields the best results.