论文标题
社交媒体在任意时间范围内的普及预测
Popularity Prediction for Social Media over Arbitrary Time Horizons
论文作者
论文摘要
实时预测社交媒体内容的普及需要在全球范围内有效运作的方法。受欢迎程度预测对于许多应用很重要,包括检测有害病毒含量以及时含量适度。预测任务很困难,因为视图是由用户兴趣,内容功能,重新显示,提要排名和网络结构之间的交互作用而产生的。我们考虑在内容项创建以来的任何给定预测时间以及在未来的任意时间范围内准确预测流行的问题。为了在不同的预测时间范围内实现高精度,模型必须使用静态特征(内容和用户),并在预测时间之前观察到的流行度增长至关重要。 我们提出了一种基于特征的方法,基于自我激发的霍克斯点过程模型,该方法涉及对内容的普及在一个或多个参考范围中的预测,并与有效的增长参数的点预测指标,以反映流行度增长的时间表。这导致了一种高度可扩展的方法,用于在任意预测时间范围内受欢迎程度预测,与几个领先的基线相比,在两个月的公共页面内容数据集中,在两个月的时间段内,也可以实现高度准确性,涵盖了数十亿个内容视图和数十万个不同的不同内容。该模型显示了针对强大基线的竞争预测准确性,该基线由分别训练的模型组成,用于特定的预测时间范围。
Predicting the popularity of social media content in real time requires approaches that efficiently operate at global scale. Popularity prediction is important for many applications, including detection of harmful viral content to enable timely content moderation. The prediction task is difficult because views result from interactions between user interests, content features, resharing, feed ranking, and network structure. We consider the problem of accurately predicting popularity both at any given prediction time since a content item's creation and for arbitrary time horizons into the future. In order to achieve high accuracy for different prediction time horizons, it is essential for models to use static features (of content and user) as well as observed popularity growth up to prediction time. We propose a feature-based approach based on a self-excited Hawkes point process model, which involves prediction of the content's popularity at one or more reference horizons in tandem with a point predictor of an effective growth parameter that reflects the timescale of popularity growth. This results in a highly scalable method for popularity prediction over arbitrary prediction time horizons that also achieves a high degree of accuracy, compared to several leading baselines, on a dataset of public page content on Facebook over a two-month period, covering billions of content views and hundreds of thousands of distinct content items. The model has shown competitive prediction accuracy against a strong baseline that consists of separately trained models for specific prediction time horizons.