论文标题
了解嵌入大规模图的软糖
Understanding Coarsening for Embedding Large-Scale Graphs
论文作者
论文摘要
当今的数据中很大一部分,例如,社交网络,Web连接等,可以通过图形建模。对机器学习(ML)算法的适当分析有可能对许多研究和工业领域产生深远的见解。但是,图数据的不规则结构构成了在链接预测,节点分类和异常检测等图上运行ML任务的障碍。图形嵌入是一个计算密集型过程,将图表表示为D维空间中的一组向量,这又使其适合ML任务。文献中已经提出了许多方法,以提高图形嵌入的性能,例如,使用分布式算法,加速器和预处理技术。图形粗化可以被视为预处理步骤,是给定的大图的结构近似,具有较小的图。正如文献所暗示的那样,嵌入的成本大大降低时,采用变高。在这项工作中,我们从速度和准确性方面彻底分析了粗化质量对嵌入性能的影响。我们对最先进的快速嵌入工具进行的实验表明,所做的粗糙决策与嵌入质量之间存在相互作用。
A significant portion of the data today, e.g, social networks, web connections, etc., can be modeled by graphs. A proper analysis of graphs with Machine Learning (ML) algorithms has the potential to yield far-reaching insights into many areas of research and industry. However, the irregular structure of graph data constitutes an obstacle for running ML tasks on graphs such as link prediction, node classification, and anomaly detection. Graph embedding is a compute-intensive process of representing graphs as a set of vectors in a d-dimensional space, which in turn makes it amenable to ML tasks. Many approaches have been proposed in the literature to improve the performance of graph embedding, e.g., using distributed algorithms, accelerators, and pre-processing techniques. Graph coarsening, which can be considered a pre-processing step, is a structural approximation of a given, large graph with a smaller one. As the literature suggests, the cost of embedding significantly decreases when coarsening is employed. In this work, we thoroughly analyze the impact of the coarsening quality on the embedding performance both in terms of speed and accuracy. Our experiments with a state-of-the-art, fast graph embedding tool show that there is an interplay between the coarsening decisions taken and the embedding quality.