论文标题
CTM-大规模多视频推文主题分类的模型
CTM -- A Model for Large-Scale Multi-View Tweet Topic Classification
论文作者
论文摘要
自动将社交媒体帖子与主题相关联是在许多社交媒体平台上进行有效搜索和建议的重要先决条件。但是,此类帖子的主题分类非常具有挑战性,因为(a)大型主题空间(b)简短的文本具有较弱的主题提示,以及(c)每个帖子的多个主题关联。与大多数先前的工作相反,大多数仅专注于少量主题($ 10 $ - $ 20 $),我们考虑在Twitter的上下文中,主题空间的任务是$ 10 $ $ 10 $ $ $ 10 $倍,并且每条推文潜在多个主题关联。我们通过提出一种新型的神经模型,即(a)支持$ 300 $主题的大型主题空间,并且(b)采用整体方法来推文内容建模 - 利用多模式内容,作者背景和深层的语义提示,该方法是通过提出新的主题空间来应对上述挑战的。我们的方法提供了一种有效的方法,可以通过与其他方法产生出色的性能($ \ Mathbf {20}} \%$中位数平均精度分数的相对提升,并在Twitter生产中成功部署。
Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post classification into a small number of topics ($10$-$20$), we consider the task of large-scale topic classification in the context of Twitter where the topic space is $10$ times larger with potentially multiple topic associations per Tweet. We address the challenges above by proposing a novel neural model, CTM that (a) supports a large topic space of $300$ topics and (b) takes a holistic approach to tweet content modeling -- leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of $\mathbf{20}\%$ in median average precision score) and has been successfully deployed in production at Twitter.