论文标题
使用自然语言处理推断媒体偏见和内容质量
Inference of Media Bias and Content Quality Using Natural-Language Processing
论文作者
论文摘要
媒体偏见可以显着影响人群中观点和情感的形成和发展。因此,重要的是研究党派媒体和政治两极分化的出现和发展。但是,定量推断媒体媒体的意识形态立场是一项挑战。在本文中,我们提出了一个定量框架,可以从文本中推断出媒体媒体的政治偏见和内容质量,并通过使用现实世界数据的经验实验来说明这一框架。我们将双向长期记忆(LSTM)神经网络应用于超过100万推文的数据集,以为每条推文生成二维意识形态偏见和内容质量测量。然后,我们通过整合媒体媒体推文的(偏见,质量)测量值来推断媒体媒体(偏见,质量)坐标的``媒体偏见图表''。我们还应用了各种基线机器学习方法,例如幼稚的bayes方法和支持矢量机(SVM),以推断每条推文的偏差和质量值。所有这些基线方法都是基于词袋方法。我们发现LSTM网络方法具有检查方法的最佳性能。我们的结果说明了在文本分析中利用单词顺序进入机器学习方法的重要性。
Media bias can significantly impact the formation and development of opinions and sentiments in a population. It is thus important to study the emergence and development of partisan media and political polarization. However, it is challenging to quantitatively infer the ideological positions of media outlets. In this paper, we present a quantitative framework to infer both political bias and content quality of media outlets from text, and we illustrate this framework with empirical experiments with real-world data. We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets to generate a two-dimensional ideological-bias and content-quality measurement for each tweet. We then infer a ``media-bias chart'' of (bias, quality) coordinates for the media outlets by integrating the (bias, quality) measurements of the tweets of the media outlets. We also apply a variety of baseline machine-learning methods, such as a naive-Bayes method and a support-vector machine (SVM), to infer the bias and quality values for each tweet. All of these baseline approaches are based on a bag-of-words approach. We find that the LSTM-network approach has the best performance of the examined methods. Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis.