Abstract
Recently, graph neural networks (GNNs) have achieved remarkable performance in representation learning on graph-structured data. However, as the number of network layers increases, GNNs based on the neighborhood aggregation strategy deteriorate due to the problem of oversmoothing, which is the major bottleneck for applying GNNs to real-world graphs. Many efforts have been made to improve the process of feature information aggregation from directly connected nodes, i.e., breadth exploration. However, these models perform the best only in the case of three or fewer layers, and the performance drops rapidly for deep layers. To alleviate oversmoothing, we propose a nested graph attention network (NGAT), which can work in a semi-supervised manner. In addition to breadth exploration, a k-layer NGAT uses a layer-wise aggregation strategy guided by the attention mechanism to selectively leverage feature information from the kth-order neighborhood, i.e., depth exploration. Even with a 10-layer or deeper architecture, NGAT can balance the need for preserving the locality (including root node features and the local structure) and aggregating the information from a large neighborhood. In a number of experiments on standard node classification tasks, NGAT outperforms other novel models and achieves state-of-the-art performance.
摘要
近年来图神经网络 (GNN) 在图结构数据表示学习方面取得显著成绩. 然而, 随着网络层数增加, 由于过度平滑问题, 基于邻域信息聚合策略的GNN性能恶化, 这也是GNN应用于真实图的主要瓶颈. 研究人员对直连节点的特征信息聚合过程进行了许多改进, 即广度探索. 然而, 这些模型仅在层数为3或更少的情况下才表现最佳, 而在深层情况下性能迅速下降. 为缓解过度平滑, 本文提出一种嵌套的图注意网络, 即基于双重注意力机制的多尺度特征融合模型NGAT, 该网络可以半监督形式工作. 除广度探索, k层NGAT运用注意力机制引导的分层聚合策略, 选择性地利用来自k阶邻域的信息特征, 即深度探索. 即使对于10层或更深的架构, NGAT也能平衡保留局部性 (包括根节点特征和局部结构) 和从大型邻域聚合信息的需求. 本文在公开数据集上对比了现有图神经网络模型, 实验表明本文提出的NGAT模型具备更强的节点嵌入学习能力.
Similar content being viewed by others
References
Atwood J, Towsley D, 2016. Diffusion-convolutional neural networks. Proc 30th Int Conf on Neural Information Processing Systems, p.2001–2009.
Belkin M, Niyogi P, Sindhwani V, 2006. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res, 7:2399–2434.
Bruna J, Zaremba W, Szlam A, et al., 2014. Spectral networks and locally connected networks on graphs. https://arxiv.org/abs/1312.6203
Buchnik E, Cohen E, 2018. Bootstrapped graph diffusions: exposing the power of nonlinearity. Proc ACM Int Conf on Measurement and Modeling of Computer Systems, p.8–10. https://doi.org/10.1145/3219617.3219621
Chapelle O, Scholkopf B, Zien A, 2009. Semi-supervised learning (Chapelle, O. et al., Eds.; 2006) [book reviews]. IEEE Trans Neur Netw, 20(3):542. https://doi.org/10.1109/TNN.2009.2015974
Chen J, Ma TF, Xiao C, 2018. FastGCN: fast learning with graph convolutional networks via importance sampling. https://arxiv.org/abs/1801.10247
Defferrard M, Bresson X, Vandergheynst P, 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Proc 30th Int Conf on Neural Information Processing Systems, p.3844–3852.
Grover A, Leskovec J, 2016. node2vec: scalable feature learning for networks. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.855–864. https://doi.org/10.1145/2939672.2939754
Hamilton WL, Ying R, Leskovec J, 2017. Inductive representation learning on large graphs. Proc 31st Int Conf on Neural Information Processing Systems, p.1025–1035.
He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770–778. https://doi.org/10.1109/CVPR.2016.90
Kingma DP, Ba J, 2014. Adam: a method for stochastic optimization. https://arxiv.org/abs/1412.6980
Kipf TN, Welling M, 2017. Semi-supervised classification with graph convolutional networks. https://arxiv.org/abs/1609.02907
Klicpera J, Bojchevski A, Günnemann S, 2019. Predict then propagate: graph neural networks meet personalized pagerank. https://arxiv.org/abs/1810.05997v4
Knyazev B, Taylor GW, Amer MR, 2019. Understanding attention and generalization in graph neural networks. Proc 33rd Conf on Neural Information Processing Systems, p.4204–4214.
Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf on Neural Information Processing Systems, p.1097–1105.
Lee J, Lee I, Kang J, 2019. Self-attention graph pooling. https://arxiv.org/abs/1904.08082
Li QM, Han ZC, Wu XM, 2018. Deeper insights into graph convolutional networks for semi-supervised learning. Proc 32nd AAAI Conf on Artificial Intelligence, p.3538–3545.
Liao RJ, Zhao ZZ, Urtasun R, et al., 2019. LanczosNet: multi-scale deep graph convolutional networks. https://arxiv.org/abs/1901.01484v1
Namata G, London B, Getoor L, et al., 2012. Query-driven active surveying for collective classification. Proc 10th Int Workshop on Mining and Learning with Graphs, Article 8.
Niepert M, Ahmed M, Kutzkov K, 2016. Learning convolutional neural networks for graphs. Proc 33rd Int Conf on Machine Learning, p.2014–2023.
Perozzi B, Al-Rfou R, Skiena S, 2014. DeepWalk: online learning of social representations. Proc 20th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.701–710. https://doi.org/10.1145/2623330.2623732
Ribeiro LFR, Saverese PHP, Figueiredo DR, 2017. struc2vec: learning node representations from structural identity. Proc 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.385–394. https://doi.org/10.1145/3097983.3098061
Sen P, Namata G, Bilgic M, et al., 2008. Collective classification in network data. AI Mag, 29(3):93. https://doi.org/10.1609/aimag.v29i3.2157
Shchur O, Mumme M, Bojchevski A, et al., 2018. Pitfalls of graph neural network evaluation. https://arxiv.org/abs/1811.05868
Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
Srivastava N, Hinton G, Krizhevsky A, et al., 2014. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 15(1):1929–1958.
Thekumparampil KK, Wang C, Oh S, et al., 2018. Attention-based graph neural network for semi-supervised learning. https://arxiv.org/abs/1803.03735
van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J Mach Learn Res, 9:2579–2605.
Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000–6010.
Velickovic P, Cucurull G, Casanova A, et al., 2018. Graph attention networks. https://arxiv.org/abs/1710.10903v1
Veličković P, Fedus W, Hamilton WL, et al., 2019. Deep graph infomax. https://arxiv.org/abs/1809.10341
Wu F, Zhang TY, de Souza AH Jr, et al., 2019. Simplifying graph convolutional networks. https://arxiv.org/abs/1902.07153
Wu ZH, Pan SR, Chen FW, et al., 2019. A comprehensive survey on graph neural networks. https://arxiv.org/abs/1901.00596
Xu K, Li CT, Tian YL, et al., 2018. Representation learning on graphs with jumping knowledge networks. https://arxiv.org/abs/1806.03536
Xu K, Hu WH, Leskovec J, et al., 2019. How powerful are graph neural networks? https://arxiv.org/abs/1810.00826
Yang ZL, Cohen W, Salakhudinov R, 2016. Revisiting semi-supervised learning with graph embeddings. Proc 33rd Int Conf on Machine Learning, p.40–48.
Zhou J, Cui GQ, Zhang ZY, et al., 2018. Graph neural networks: a review of methods and applications. https://arxiv.org/abs/1812.08434
Zhu XJ, Ghahramani Z, Lafferty J, 2003. Semi-supervised learning using Gaussian fields and harmonic functions. Proc 20th Int Conf on Machine Learning, p.912–919.
Zou DF, Hu ZN, Wang YW, et al., 2019. Layer-dependent importance sampling for training deep and large graph convolutional networks. Proc 33rd Int Conf on Neural Information Processing Systems, p.11247–11256.
Author information
Authors and Affiliations
Contributions
Jianke HU and Yin ZHANG designed the research. Jianke HU processed the data and drafted the paper. Yin ZHANG helped organize the paper. Jianke HU and Yin ZHANG revised and finalized the paper.
Corresponding author
Ethics declarations
Jianke HU and Yin ZHANG declare that they have no conflict of interest.
Additional information
Project supported by China Knowledge Centre for Engineering Sciences and Technology (CKCEST)
Rights and permissions
About this article
Cite this article
Hu, J., Zhang, Y. NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning. Front Inform Technol Electron Eng 23, 409–421 (2022). https://doi.org/10.1631/FITEE.2000657
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2000657