论文标题
FACTOID:一个用于识别错误信息传播者和政治偏见的新数据集
FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
论文作者
论文摘要
主动确定散布者的错误信息是减轻假新闻对我们社会的影响的重要一步。在本文中,我们介绍了一个新的当代Reddit数据集进行虚假新闻播放器分析,称为FACTOID,自2020年初以来对Reddit进行了政治讨论。该数据集包含具有340万个reddit帖子的4K用户,并且包括用户的二元性二进制标签,以及他们的二元性良好的标签,以及他们的良好的信誉水平,以及非常高的政治态度(非常高)(极端的偏见)(极端的偏见)(极端的偏见),以至于左右)。据我们所知,这是第一个同时捕获用户历史帖子及其之间相互作用的长期背景的假新闻播放器数据集。为了在我们的数据上创建第一个基准测试,我们提供了通过利用用户之间的社交联系以及他们的心理语言特征来识别错误信息传播器的方法。我们表明,用户的社交互动可以单独表明传播错误,而心理语言特征在非神经分类环境中大多是信息的。在定性分析中,我们观察到,检测情感心理过程与右偏见的用户负相关,并且对于那些传播假新闻的人来说,经验因素的开放性较低。
Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4M Reddit posts, and includes, beyond the users' binary labels, also their fine-grained credibility level (very low to very high) and their political bias strength (extreme right to extreme left). As far as we are aware, this is the first fake news spreader dataset that simultaneously captures both the long-term context of users' historical posts and the interactions between them. To create the first benchmark on our data, we provide methods for identifying misinformation spreaders by utilizing the social connections between the users along with their psycho-linguistic features. We show that the users' social interactions can, on their own, indicate misinformation spreading, while the psycho-linguistic features are mostly informative in non-neural classification settings. In a qualitative analysis, we observe that detecting affective mental processes correlates negatively with right-biased users, and that the openness to experience factor is lower for those who spread fake news.