论文标题
选择网络嵌入维度的原则方法
Principled approach to the selection of the embedding dimension of networks
论文作者
论文摘要
网络嵌入是一种通用机器学习技术,可在具有可调尺寸的向量空间中编码网络结构。选择适当的嵌入尺寸(足够小,足以有效且足够大,可以有效),这是具有挑战性的,但对于生成适用于多种任务的嵌入的必要条件。选择嵌入维度的现有策略取决于下游任务中的性能最大化。在这里,我们提出了一种有原则的方法,以使网络的所有结构信息均简短地编码。该方法在各种嵌入算法和大量现实世界网络上进行了验证。我们方法在现实世界网络中选择的嵌入维度表明,通常可以在低维空间中进行有效编码。
Network embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension -- small enough to be efficient and large enough to be effective -- is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strategies for the selection of the embedding dimension rely on performance maximization in downstream tasks. Here, we propose a principled method such that all structural information of a network is parsimoniously encoded. The method is validated on various embedding algorithms and a large corpus of real-world networks. The embedding dimension selected by our method in real-world networks suggest that efficient encoding in low-dimensional spaces is usually possible.