论文标题
向量的度量分布:通过大规模差异构建数据表示
Metric Distribution to Vector: Constructing Data Representation via Broad-Scale Discrepancies
论文作者
论文摘要
图形嵌入提供了一种可行的方法,可以通过将每个数据映射到矢量空间中来进行图形结构化数据的模式分类。各种开创性的作品本质上是编码方法,该方法集中在拓扑构成,节点归因,链接关系等方面,涉及图形的内部特性。但是,每个目标数据的分类是一个定性问题,基于理解数据集量表中的整体差异。从统计的角度来看,如果采用距离度量来衡量成对的相似性或相似性,则这些差异在数据集量表上表现出一个度量分布。因此,我们提出了一种名为$ \ mathbf {metrricdistribution2vec} $的新颖嵌入策略,以将这种分布特性提取到每个数据的矢量表示中。我们在广泛的现实世界结构图数据集中证明了表示方法在监督预测任务中的应用和有效性。与所有数据集的基线相比,即使我们将轻量级模型作为分类器,结果与基线的激增相比,结果已经出乎意料。此外,所提出的方法还在几个射击分类方案中进行了实验,结果在基于稀有训练样本的推论中仍然显示出有吸引力的歧视。
Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by mapping each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classification for each targeted data is a qualitative issue based on understanding the overall discrepancies within the dataset scale. From the statistical point of view, these discrepancies manifest a metric distribution over the dataset scale if the distance metric is adopted to measure the pairwise similarity or dissimilarity. Therefore, we present a novel embedding strategy named $\mathbf{MetricDistribution2vec}$ to extract such distribution characteristics into the vectorial representation for each data. We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets. The results have gained some unexpected increases compared with a surge of baselines on all the datasets, even if we take the lightweight models as classifiers. Moreover, the proposed methods also conducted experiments in Few-Shot classification scenarios, and the results still show attractive discrimination in rare training samples based inference.