论文标题
基于随机步行的生成模型,用于分类文档网络
Random-walk Based Generative Model for Classifying Document Networks
论文作者
论文摘要
文档网络可在各种现实世界数据集中找到,例如引用网络,超链接网页和在线社交网络。已经提出了许多生成模型,因为它们为分析文档网络提供了直观且有用的图片。突出的示例是关系主题模型,其中文档根据其主题相似性链接。但是,现有的生成模型不能完全使用网络结构,因为它们在很大程度上取决于文档的主题建模。特别是,在先前模型的生成过程中缺少图形节点的中心性。在本文中,我们通过在网络上引入随机步行者来将节点中心性集成到链路生成过程中,为文档网络提出了一种新颖的生成模型。在使用现实世界引用网络的半监督分类任务中评估开发的方法。我们表明,所提出的模型优于现有的概率方法,尤其是在检测连接网络中的社区时。
Document networks are found in various collections of real-world data, such as citation networks, hyperlinked web pages, and online social networks. A large number of generative models have been proposed because they offer intuitive and useful pictures for analyzing document networks. Prominent examples are relational topic models, where documents are linked according to their topic similarities. However, existing generative models do not make full use of network structures because they are largely dependent on topic modeling of documents. In particular, centrality of graph nodes is missing in generative processes of previous models. In this paper, we propose a novel generative model for document networks by introducing random walkers on networks to integrate the node centrality into link generation processes. The developed method is evaluated in semi-supervised classification tasks with real-world citation networks. We show that the proposed model outperforms existing probabilistic approaches especially in detecting communities in connected networks.