具有不可知论标签选择偏见的DECIAS图形神经网络

论文标题

具有不可知论标签选择偏见的DECIAS图形神经网络

Debiased Graph Neural Networks with Agnostic Label Selection Bias

论文作者

Fan, Shaohua, Wang, Xiao, Shi, Chuan, Kuang, Kun, Liu, Nian, Wang, Bai

论文摘要

提出了大多数现有的图形神经网络（GNN），而无需考虑数据中的选择偏差，即与测试集的训练集之间的分布不一致。实际上，在培训过程中甚至无法使用测试数据，因此选择偏见不可知论。具有偏置选定节点的训练GNN会导致明显的参数估计偏置，并极大地影响了对测试节点的概括能力。在本文中，我们首先提出了一项实验研究，该研究清楚地表明，选择偏差极大地阻碍了GNN的概括能力，理论上证明选择偏差会导致对GNN参数的偏差估计。然后，为了消除GNN估计中的偏见，我们提出了一个具有分化的去相关正常化程序的新型伪造的图形神经网络（DGNN）。分化的去相关正规器估计每个标记的节点的样品重量，从而可以消除学习嵌入的伪造相关性。我们在因果观点中分析了正规化学剂，它激发了我们根据变量对混杂偏见的贡献来区分这些变量的权重。然后，这些样品权重用于重新加权GNN以消除估计偏差，从而有助于提高对未知测试节点的预测稳定性。全面的实验是在具有两种标签选择偏见的几个具有挑战性的图数据集上进行的。结果很好地证明了我们提出的模型优于最先进的方法，而DGNN是增强现有GNN的灵活框架。

Most existing Graph Neural Networks (GNNs) are proposed without considering the selection bias in data, i.e., the inconsistent distribution between the training set with test set. In reality, the test data is not even available during the training process, making selection bias agnostic. Training GNNs with biased selected nodes leads to significant parameter estimation bias and greatly impacts the generalization ability on test nodes. In this paper, we first present an experimental investigation, which clearly shows that the selection bias drastically hinders the generalization ability of GNNs, and theoretically prove that the selection bias will cause the biased estimation on GNN parameters. Then to remove the bias in GNN estimation, we propose a novel Debiased Graph Neural Networks (DGNN) with a differentiated decorrelation regularizer. The differentiated decorrelation regularizer estimates a sample weight for each labeled node such that the spurious correlation of learned embeddings could be eliminated. We analyze the regularizer in causal view and it motivates us to differentiate the weights of the variables based on their contribution on the confounding bias. Then, these sample weights are used for reweighting GNNs to eliminate the estimation bias, thus help to improve the stability of prediction on unknown test nodes. Comprehensive experiments are conducted on several challenging graph datasets with two kinds of label selection biases. The results well verify that our proposed model outperforms the state-of-the-art methods and DGNN is a flexible framework to enhance existing GNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题