散点样本：用于数据有效图形神经网络学习的多元化标签采样

论文标题

散点样本：用于数据有效图形神经网络学习的多元化标签采样

ScatterSample: Diversified Label Sampling for Data Efficient Graph Neural Network Learning

论文作者

Dai, Zhenwei, Ioannidis, Vasileios, Adeshina, Soji, Jost, Zak, Faloutsos, Christos, Karypis, George

论文摘要

哪些目标标签对于图形神经网络（GNN）培训最有效？在某些应用GNNS Excel样药物设计或欺诈检测的应用中，标记新实例很昂贵。我们开发一个具有数据效率的主动采样框架，即ScatterSample，以在主动学习设置下训练GNN。 ScatterSample采用称为不同确定性的抽样模块，从样品空间的不同区域收集具有较大不确定性的实例以进行标记。为了确保所选节点的多样化，不同的确定性簇群簇较高的不确定性节点，并从每个群集中选择代表性节点。严格的理论分析与标准的主动采样方法相比，我们的散点样品算法进一步支持了其优势，该方法旨在简单地简单地提高不确定性而不是使样品多样化。特别是，我们表明ScatterSample能够在整个样品空间上有效地减少模型不确定性。我们在五个数据集上的实验表明，散点样品显着胜过其他GNN主动学习基线，特别是它将采样成本降低了50％，同时达到相同的测试精度。

What target labels are most effective for graph neural network (GNN) training? In some applications where GNNs excel-like drug design or fraud detection, labeling new instances is expensive. We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting. ScatterSample employs a sampling module termed DiverseUncertainty to collect instances with large uncertainty from different regions of the sample space for labeling. To ensure diversification of the selected nodes, DiverseUncertainty clusters the high uncertainty nodes and selects the representative nodes from each cluster. Our ScatterSample algorithm is further supported by rigorous theoretical analysis demonstrating its advantage compared to standard active sampling methods that aim to simply maximize the uncertainty and not diversify the samples. In particular, we show that ScatterSample is able to efficiently reduce the model uncertainty over the whole sample space. Our experiments on five datasets show that ScatterSample significantly outperforms the other GNN active learning baselines, specifically it reduces the sampling cost by up to 50% while achieving the same test accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题