Abstract
Graph attention networks are effective graph neural networks that perform graph embedding for semi-supervised learning, which considers the neighbors of a node when learning its features. This paper presents a novel attention-based graph neural network that introduces an attention mechanism in the word-represented features of a node together incorporating the neighbors’ attention in the embedding process. Instead of using a vector as the feature of a node in the traditional graph attention networks, the proposed method uses a 2D matrix to represent a node, where each row in the matrix stands for a different attention distribution against the original word-represented features of a node. Then, the compressed features are fed into a graph attention layer that aggregates the matrix representation of the node and its neighbor nodes with different attention weights as a new representation. By stacking several graph attention layers, it obtains the final representation of nodes as matrices, which considers both that the neighbors of a node have different importance and that the words also have different importance in their original features. Experimental results on three citation network datasets show that the proposed method significantly outperforms eight state-of-the-art methods in semi-supervised classification tasks.
Similar content being viewed by others
References
Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola AJ (2013) Distributed large-scale natural graph factorization. In: Proceedings of the 22nd international conference on world wide web, pp 37–48
Ambartsoumian A, Popowich F (2018) Self-attention: A better building block for sentiment analysis neural network classifiers. ArXiv preprint arXiv:1812.07860
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ArXiv preprint arXiv:1409.0473
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, Vol 14, pp 585–591
Benson AR, Gleich DF, Leskovec J (2016) Higher-order organization of complex networks. Science 353(6295):163–166
Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. ArXiv preprint arXiv:1312.6203
Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international conference on information and knowledge management, pp 891–900
Chang CH, Hwang SY (2021) A word embedding-based approach to cross-lingual topic modeling. Knowl Inf Syst 63(6):1529–1555
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. ArXiv preprint arXiv:1409.1259
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv preprint arXiv:1412.3555
Cox MAA and Cox TF (2008) Multidimensional scaling. Handbook of data visualization pp 315–347
Dev S, Hassan S, Phillips JM (2021) Closed form word embedding alignment. Knowl Inf Syst 63(3):565–588
Gehring J, Auli M, Grangier D, Dauphin YN (2016) A convolutional encoder model for neural machine translation. ArXiv preprint arXiv:1611.02344
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: Proceedings of the 34th international conference on machine learning, pp 1263–1272
Grover A, Leskovec J (2016) Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864
Guo X, Zhao L, Homayoun H, Dinakarrao SM (2021) Deep graph transformation for attributed, directed, and signed networks. Knowl Inf Syst 63(6):1305–1337
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, Vol 30, pp 1024–1034
Henaff M, Bruna J, LeCun Y (2015) Deep convolutional networks on graph-structured data. ArXiv preprint arXiv:1506.05163
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ArXiv preprint arXiv:1412.6980
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: The 5th international conference on learning representations (ICLR)
Lee JB, Rossi R, Kong X (2018) A structured self-attentive sentence embedding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1666–1674
Lim KW, Buntine W (2015) Bibliographic analysis with the citation network topic model. In: Asian conference on machine learning, pp 142–158
Lin Z, Feng M, Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. ArXiv preprint arXiv:1703.03130
Lu J, Yang J, Batra D, Parikh D. (2016) Hierarchical question-image co-attention for visual question answering. In: Advances in neural information processing systems, pp 289–297
McCallum AK, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inform Retrieval 3(2):127–163
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. ArXiv preprint arXiv:1301.3781
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P (2013) The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process Magaz 30(3):83–98
Song Y, Wang J, Jiang T, Liu Z, Rao Y (2019) Attentional encoder network for targeted sentiment classification. ArXiv preprint arXiv:1902.09314
Sukhbaatar S, Szlam A, Weston J, Fergus R (2015) End-to-end memory networks. In: Advances in neural information processing systems, Vol 28, pp 2440–2448
Sun Y, Guo G, Chen X, Zhang P, Wang X (2020) Exploiting review embedding and user attention for item recommendation. Knowl Inf Syst 62(8):3015–3038
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 21067–1077
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 990–998
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tu C, Zhang W, Liu Z, Sun M (2016) Max-margin deepwalk: Discriminative learning of network representation. In: Proceedings of the 25th international joint conference on artificial intelligence, pp 3889–3895
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: The 6th international conference on learning representations
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1225–1234
Wang S, Tang J, Aggarwal C, Liu H (2016) Linked document embedding for classification. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 115–124
Zhang Z, Cui P, Wang X, Pei J, Yao X, and Zhu W (2018) Arbitrary-order proximity preserved network embedding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2778–2786
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81
Acknowledgements
We thank anonymous reviewers for their very useful comments and suggestions. This work has been supported by the National Natural Science Foundation of China under Grants 62076130, 91846104, 61902186, the National Key Research and Development Program of China under Grant 2018AAA0102002, the Natural Science Foundation of Jiangsu Province, China, under Grant BK20180463, the Fundamental Research Funds for the Central Universities under Grants 30920010008, 30919011282, and the Postdoctoral Science Foundation of China under Grant 2019M651835.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, J., Li, M., Gao, K. et al. Word and graph attention networks for semi-supervised classification. Knowl Inf Syst 63, 2841–2859 (2021). https://doi.org/10.1007/s10115-021-01610-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01610-3