Skip to main content
Log in

Word and graph attention networks for semi-supervised classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graph attention networks are effective graph neural networks that perform graph embedding for semi-supervised learning, which considers the neighbors of a node when learning its features. This paper presents a novel attention-based graph neural network that introduces an attention mechanism in the word-represented features of a node together incorporating the neighbors’ attention in the embedding process. Instead of using a vector as the feature of a node in the traditional graph attention networks, the proposed method uses a 2D matrix to represent a node, where each row in the matrix stands for a different attention distribution against the original word-represented features of a node. Then, the compressed features are fed into a graph attention layer that aggregates the matrix representation of the node and its neighbor nodes with different attention weights as a new representation. By stacking several graph attention layers, it obtains the final representation of nodes as matrices, which considers both that the neighbors of a node have different importance and that the words also have different importance in their original features. Experimental results on three citation network datasets show that the proposed method significantly outperforms eight state-of-the-art methods in semi-supervised classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola AJ (2013) Distributed large-scale natural graph factorization. In: Proceedings of the 22nd international conference on world wide web, pp 37–48

  2. Ambartsoumian A, Popowich F (2018) Self-attention: A better building block for sentiment analysis neural network classifiers. ArXiv preprint arXiv:1812.07860

  3. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ArXiv preprint arXiv:1409.0473

  4. Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, Vol 14, pp 585–591

  5. Benson AR, Gleich DF, Leskovec J (2016) Higher-order organization of complex networks. Science 353(6295):163–166

    Article  Google Scholar 

  6. Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. ArXiv preprint arXiv:1312.6203

  7. Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international conference on information and knowledge management, pp 891–900

  8. Chang CH, Hwang SY (2021) A word embedding-based approach to cross-lingual topic modeling. Knowl Inf Syst 63(6):1529–1555

    Article  Google Scholar 

  9. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. ArXiv preprint arXiv:1409.1259

  10. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv preprint arXiv:1412.3555

  11. Cox MAA and Cox TF (2008) Multidimensional scaling. Handbook of data visualization pp 315–347

  12. Dev S, Hassan S, Phillips JM (2021) Closed form word embedding alignment. Knowl Inf Syst 63(3):565–588

    Article  Google Scholar 

  13. Gehring J, Auli M, Grangier D, Dauphin YN (2016) A convolutional encoder model for neural machine translation. ArXiv preprint arXiv:1611.02344

  14. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: Proceedings of the 34th international conference on machine learning, pp 1263–1272

  15. Grover A, Leskovec J (2016) Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864

  16. Guo X, Zhao L, Homayoun H, Dinakarrao SM (2021) Deep graph transformation for attributed, directed, and signed networks. Knowl Inf Syst 63(6):1305–1337

    Article  Google Scholar 

  17. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, Vol 30, pp 1024–1034

  18. Henaff M, Bruna J, LeCun Y (2015) Deep convolutional networks on graph-structured data. ArXiv preprint arXiv:1506.05163

  19. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780

    Article  Google Scholar 

  20. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ArXiv preprint arXiv:1412.6980

  21. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: The 5th international conference on learning representations (ICLR)

  22. Lee JB, Rossi R, Kong X (2018) A structured self-attentive sentence embedding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1666–1674

  23. Lim KW, Buntine W (2015) Bibliographic analysis with the citation network topic model. In: Asian conference on machine learning, pp 142–158

  24. Lin Z, Feng M, Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. ArXiv preprint arXiv:1703.03130

  25. Lu J, Yang J, Batra D, Parikh D. (2016) Hierarchical question-image co-attention for visual question answering. In: Advances in neural information processing systems, pp 289–297

  26. McCallum AK, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inform Retrieval 3(2):127–163

    Article  Google Scholar 

  27. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. ArXiv preprint arXiv:1301.3781

  28. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710

  29. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  30. Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P (2013) The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process Magaz 30(3):83–98

    Article  Google Scholar 

  31. Song Y, Wang J, Jiang T, Liu Z, Rao Y (2019) Attentional encoder network for targeted sentiment classification. ArXiv preprint arXiv:1902.09314

  32. Sukhbaatar S, Szlam A, Weston J, Fergus R (2015) End-to-end memory networks. In: Advances in neural information processing systems, Vol 28, pp 2440–2448

  33. Sun Y, Guo G, Chen X, Zhang P, Wang X (2020) Exploiting review embedding and user attention for item recommendation. Knowl Inf Syst 62(8):3015–3038

    Article  Google Scholar 

  34. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 21067–1077

  35. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 990–998

  36. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  37. Tu C, Zhang W, Liu Z, Sun M (2016) Max-margin deepwalk: Discriminative learning of network representation. In: Proceedings of the 25th international joint conference on artificial intelligence, pp 3889–3895

  38. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: The 6th international conference on learning representations

  39. Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1225–1234

  40. Wang S, Tang J, Aggarwal C, Liu H (2016) Linked document embedding for classification. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 115–124

  41. Zhang Z, Cui P, Wang X, Pei J, Yao X, and Zhu W (2018) Arbitrary-order proximity preserved network embedding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2778–2786

  42. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81

    Article  Google Scholar 

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. This work has been supported by the National Natural Science Foundation of China under Grants 62076130, 91846104, 61902186, the National Key Research and Development Program of China under Grant 2018AAA0102002, the Natural Science Foundation of Jiangsu Province, China, under Grant BK20180463, the Fundamental Research Funds for the Central Universities under Grants 30920010008, 30919011282, and the Postdoctoral Science Foundation of China under Grant 2019M651835.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jing Zhang or Cangqi Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Li, M., Gao, K. et al. Word and graph attention networks for semi-supervised classification. Knowl Inf Syst 63, 2841–2859 (2021). https://doi.org/10.1007/s10115-021-01610-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01610-3

Keywords

Navigation