模型退化阻碍了深图神经网络

论文标题

模型退化阻碍了深图神经网络

Model Degradation Hinders Deep Graph Neural Networks

论文作者

Zhang, Wentao, Sheng, Zeang, Yin, Ziqi, Jiang, Yuezihan, Xia, Yikuan, Gao, Jun, Yang, Zhi, Cui, Bin

论文摘要

图形神经网络（GNN）在各种图挖掘任务中取得了巨大的成功。结果，大多数GNN仅具有浅层建筑，这限制了它们的表现力和对深社区的开发。最近的研究将深度GNN的性能降低归因于\ textit {过度平滑}问题。在本文中，我们将传统的图形卷积操作分为两个独立的操作：\ textIt {passagation}（\ textbf {p}）和\ textit {transformation}（\ textbf {t}）。此后，可以将GNN的深度分为传播深度_（$ d_p $）和$ d_p $ defressation（$ d_p $）和$ d_ $ defressation（$）和$ defressation（$）（$）。通过广泛的实验，我们发现深gnns性能下降的主要原因是\ textIt {model降解}问题是由大$ d_t $而不是\ textit {过度平滑}问题，主要由大$ d_p $引起。此外，我们提出\ textIt {自适应初始残留}（air），即与各种GNN架构兼容的插件模块，以减轻\ textit {model dradation degradation}问题和\ textit {textit {过度弹性}的问题。六个现实世界数据集的实验结果表明，配备空气的GNN胜过大多数具有浅建筑的GNN，这是由于大型$ d_p $和$ d_t $的好处，而与空气相关的时间成本则可以忽略。

Graph Neural Networks (GNNs) have achieved great success in various graph mining tasks.However, drastic performance degradation is always observed when a GNN is stacked with many layers. As a result, most GNNs only have shallow architectures, which limits their expressive power and exploitation of deep neighborhoods.Most recent studies attribute the performance degradation of deep GNNs to the \textit{over-smoothing} issue. In this paper, we disentangle the conventional graph convolution operation into two independent operations: \textit{Propagation} (\textbf{P}) and \textit{Transformation} (\textbf{T}).Following this, the depth of a GNN can be split into the propagation depth ($D_p$) and the transformation depth ($D_t$). Through extensive experiments, we find that the major cause for the performance degradation of deep GNNs is the \textit{model degradation} issue caused by large $D_t$ rather than the \textit{over-smoothing} issue mainly caused by large $D_p$. Further, we present \textit{Adaptive Initial Residual} (AIR), a plug-and-play module compatible with all kinds of GNN architectures, to alleviate the \textit{model degradation} issue and the \textit{over-smoothing} issue simultaneously. Experimental results on six real-world datasets demonstrate that GNNs equipped with AIR outperform most GNNs with shallow architectures owing to the benefits of both large $D_p$ and $D_t$, while the time costs associated with AIR can be ignored.

下载PDF全文

下载文献需遵守相关版权规定

论文标题