通过对抗知识蒸馏压缩深图神经网络

论文标题

通过对抗知识蒸馏压缩深图神经网络

Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation

论文作者

He, Huarui, Wang, Jie, Zhang, Zhanqiu, Wu, Feng

论文摘要

深图神经网络（GNN）已被证明是对图形结构化数据进行建模的表达性。然而，深度图模型的堆叠过多的架构使得很难在移动或嵌入式系统上部署和快速测试。为了压缩过度堆叠的GNN，通过教师学生建筑进行的知识蒸馏被证明是一种有效的技术，其中关键步骤是衡量具有预定距离功能的教师和学生网络之间的差异。但是，对于各种结构的图形，使用相同的距离可能不合适，并且很难确定最佳距离公式。为了解决这些问题，我们为名为Graphakd的图形模型提出了一个新颖的对抗知识蒸馏框架，该框架对敌对训练歧视器和发电机，以适应性地检测和减少差异。具体而言，注意到捕获良好的节点间和类间的相关性有利于深度GNN的成功，我们建议使用可训练的歧视者批评来自节点级别和班级级别观点的继承知识。歧视者区分了教师知识和学生的继承内容，而学生GNN则是发电机，旨在欺骗歧视者。据我们所知，Graphakd是第一个将对抗性训练引入图形域中的知识蒸馏的人。节点级别和图形分类基准的实验表明，Graphakd可以通过很大的边距提高学生的表现。结果表明，Graphakd可以将知识从复杂的教师GNN转移到紧凑的学生GNN。

Deep graph neural networks (GNNs) have been shown to be expressive for modeling graph-structured data. Nevertheless, the over-stacked architecture of deep graph models makes it difficult to deploy and rapidly test on mobile or embedded systems. To compress over-stacked GNNs, knowledge distillation via a teacher-student architecture turns out to be an effective technique, where the key step is to measure the discrepancy between teacher and student networks with predefined distance functions. However, using the same distance for graphs of various structures may be unfit, and the optimal distance formulation is hard to determine. To tackle these problems, we propose a novel Adversarial Knowledge Distillation framework for graph models named GraphAKD, which adversarially trains a discriminator and a generator to adaptively detect and decrease the discrepancy. Specifically, noticing that the well-captured inter-node and inter-class correlations favor the success of deep GNNs, we propose to criticize the inherited knowledge from node-level and class-level views with a trainable discriminator. The discriminator distinguishes between teacher knowledge and what the student inherits, while the student GNN works as a generator and aims to fool the discriminator. To our best knowledge, GraphAKD is the first to introduce adversarial training to knowledge distillation in graph domains. Experiments on node-level and graph-level classification benchmarks demonstrate that GraphAKD improves the student performance by a large margin. The results imply that GraphAKD can precisely transfer knowledge from a complicated teacher GNN to a compact student GNN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题