论文标题
PushShift Telegram数据集
The Pushshift Telegram Dataset
论文作者
论文摘要
消息平台,尤其是那些具有移动关注点的消息平台在社会中变得越来越普遍。这些移动消息平台可能具有欺骗性的大型用户群,除了成为人们保持联系的一种方式外,还经常被用来组织社交运动,以及极端主义者和其他Ne'er-Do-well的地方。在本文中,我们从一个这样的移动消息平台中介绍了一个数据集:Telegram。我们的数据集由来自220万个唯一用户的27.8K频道和3.17亿封信组成。据我们所知,我们的数据集是同类产品中最大,最完整的。除了原始数据外,我们还提供用于收集它的源代码,使研究人员可以运行自己的数据收集实例。我们认为,PushShift Telegram数据集可以帮助研究人员来自对在线社会运动,抗议,政治极端主义和虚假信息感兴趣的各种学科的研究人员。
Messaging platforms, especially those with a mobile focus, have become increasingly ubiquitous in society. These mobile messaging platforms can have deceivingly large user bases, and in addition to being a way for people to stay in touch, are often used to organize social movements, as well as a place for extremists and other ne'er-do-well to congregate. In this paper, we present a dataset from one such mobile messaging platform: Telegram. Our dataset is made up of over 27.8K channels and 317M messages from 2.2M unique users. To the best of our knowledge, our dataset comprises the largest and most complete of its kind. In addition to the raw data, we also provide the source code used to collect it, allowing researchers to run their own data collection instance. We believe the Pushshift Telegram dataset can help researchers from a variety of disciplines interested in studying online social movements, protests, political extremism, and disinformation.