纽布：200,000多个政治偏见检测句子

论文标题

纽布：200,000多个政治偏见检测句子

NewB: 200,000+ Sentences for Political Bias Detection

论文作者

Wei, Jerry

论文摘要

我们介绍了报纸偏见数据集（NEWB），这是一篇文本语料库，其中有十一句有关唐纳德·特朗普的新闻来源的句子。虽然以前的数据集将句子标记为自由主义者或保守派，但NewB涵盖了11种流行媒体资料的政治观点，比传统的二元分类系统所捕捉到更多细微的政治观点。我们培训两个最先进的深度学习模型，以预测11个报纸的给定句子的新闻来源，发现复发性神经网络获得了TOP-1，TOP-3和TOP-5精确度的33.3％，61.4％和77.6％，分别超过了基线逻辑回归模型的准确性18.3％和4. 3％，4。3.6％和42.6％和42.6％。使用句子的新闻来源标签，我们通过模型分析了顶级N-Grams，以获得媒体来源对特朗普描绘的有意义的见解。我们希望我们的数据集的公开发布将鼓励进一步的研究使用自然语言处理来分析更复杂的政治偏见。我们的数据集发布在https://github.com/jerryweiai/newb上。

We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200,000 sentences from eleven news sources regarding Donald Trump. While previous datasets have labeled sentences as either liberal or conservative, NewB covers the political views of eleven popular media sources, capturing more nuanced political viewpoints than a traditional binary classification system does. We train two state-of-the-art deep learning models to predict the news source of a given sentence from eleven newspapers and find that a recurrent neural network achieved top-1, top-3, and top-5 accuracies of 33.3%, 61.4%, and 77.6%, respectively, significantly outperforming a baseline logistic regression model's accuracies of 18.3%, 42.6%, and 60.8%. Using the news source label of sentences, we analyze the top n-grams with our model to gain meaningful insight into the portrayal of Trump by media sources.We hope that the public release of our dataset will encourage further research in using natural language processing to analyze more complex political biases. Our dataset is posted at https://github.com/JerryWeiAI/NewB .

下载PDF全文

下载文献需遵守相关版权规定

论文标题