skip to main content
research-article

Attention Models in Graphs: A Survey

Published: 11 November 2019 Publication History

Abstract

Graph-structured data arise naturally in many different application domains. By representing data as graphs, we can capture entities (i.e., nodes) as well as their relationships (i.e., edges) with each other. Many useful insights can be derived from graph-structured data as demonstrated by an ever-growing body of work focused on graph mining. However, in the real-world, graphs can be both large—with many complex patterns—and noisy, which can pose a problem for effective graph mining. An effective way to deal with this issue is to incorporate “attention” into graph mining solutions. An attention mechanism allows a method to focus on task-relevant parts of the graph, helping it to make better decisions. In this work, we conduct a comprehensive and focused survey of the literature on the emerging field of graph attention models. We introduce three intuitive taxonomies to group existing work. These are based on problem setting (type of input and output), the type of attention mechanism used, and the task (e.g., graph classification, link prediction). We motivate our taxonomies through detailed examples and use each to survey competing approaches from a unique standpoint. Finally, we highlight several challenges in the area and discuss promising directions for future work.

References

[1]
Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, and Alex Alemi. 2018. Watch your step: Learning node embeddings via graph attention. In Proc. NeurIPS. 9198--9208.
[2]
Charu C. Aggarwal, Amotz Bar-Noy, and Simon Shamoun. 2017. On sensor selection in linked information networks. Computer Networks 126, C (2017), 100--113.
[3]
Charu C. Aggarwal and Haixun Wang. 2010. Graph Data Management and Mining: A Survey of Algorithms and Applications. In Advances in Database Systems, Vol. 40. Springer.
[4]
Nesreen K. Ahmed, Nick Duffield, Theodore L. Willke, and Ryan A. Rossi. 2017. On sampling from massive graph streams. In Proc. VLDB. 1430--1441.
[5]
Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2014. Network sampling: From static to streaming graphs. ACM TKDD 8, 2 (2014), 1--56.
[6]
Nesreen K. Ahmed, Ryan Rossi, John Boaz Lee, Xiangnan Kong, Theodore L. Willke, Rong Zhou, and Hoda Eldardiry. 2018. Learning role-based graph embeddings. arXiv:1802.02896.
[7]
Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. DMKD 29, 3 (2015), 626--688.
[8]
Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. 1999. Internet: Diameter of the world-wide web. Nature 401, 1 (1999), 130--131.
[9]
Stefano Allesina, Antonio Bodini, and Cristina Bondavalli. 2005. Ecological subsystems via graph theory: The role of strongly connected components. Oikos 110, 1 (2005), 164--176.
[10]
Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proc. WSDM. 635--644.
[11]
Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. ICLR. 1--15.
[12]
Zilong Bai, Peter B. Walker, Anna E. Tschiffely, Fei Wang, and Ian Davidson. 2017. Unsupervised network discovery for brain imaging data. In Proc. KDD. 55--64.
[13]
Andy Brown, Aaron Tuor, Brian Hutchinson, and Nicole Nichols. 2018. Recurrent neural network attention mechanisms for interpretable system log anomaly detection. arXiv:1803.04967v1.
[14]
Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques and applications. ACM TKDE 30, 9(2018), 1616--1637.
[15]
Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic training of graph convolutional networks with variance reduction. In Proc. of ICML. 941--949.
[16]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based attention model for healthcare representation learning. In Proc. KDD. 787--795.
[17]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In Proc. NeurIPS. 3504--3512.
[18]
Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2007. Transferring naive bayes classifiers for text classification. In Proc. AAAI. 540--545.
[19]
Shuiguang Deng, Longtao Huang, Guandong Xu, Xindong Wu, and Zhaohui Wu. 2017. On deep learning for trust-aware recommendations in social networks. IEEE TNNLS 28, 5 (2017), 1164--1177.
[20]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proc. KDD. 135--144.
[21]
David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gomez-Bombarelli, Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Proc. NeurIPS. 2224--2232.
[22]
Jun Feng, Minlie Huang, Yang Yang, and Xiaoyan Zhu. 2016. GAKE: Graph aware knowledge embedding. In Proc. COLING. 641--651.
[23]
Andrea Galassi, Marco Lippi, and Paolo Torroni. 2019. Attention, please! A critical review of neural attention models in natural language processing. arXiv:1902.02181v1.
[24]
Felix A. Gers, Jurgen Schmidhuber, and Fred A. Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 (2000), 1--20.
[25]
Lise Getoor and Christopher P. Diehl. 2015. Link mining: A survey. SIGKDD Explorations Newsletter 7, 2 (2015), 3--12.
[26]
M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. PNAS 99, 12 (2002), 7821--7826.
[27]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proc. NeurIPS. 2672--2680.
[28]
Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv:1410.5401.
[29]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proc. KDD. 855--864.
[30]
Junliang Guo, Linli Xu, and Enhong Cheng. 2018. SPINE: Structural identity preserved inductive network embedding. In Proc. of IJCAI. 2399--2405.
[31]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proc. NeurIPS. 1--11.
[32]
Xu Han, Zhiyuan Liu, and Maosong Sun. 2018. Neural knowledge acquisition via mutual attention between knowledge graph and text. In Proc. AAAI. 1--8.
[33]
Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. 2017. Neuroscience-inspired artificial intelligence. Neuron 95, 2 (2017), 245--258.
[34]
Xinran He and David Kempe. 2014. Stability of influence maximization. In Proc. KDD. 1256--1265.
[35]
Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S. Yu. 2018. Leveraging meta-path based context for Top-N recommendation with a neural co-attention model. In Proc. KDD. 1531--1540.
[36]
Chuntao Jiang, Frans Coenen, and Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75--105.
[37]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proc. ICLR. 1--14.
[38]
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proc. ICML. 2397--2406.
[39]
John Boaz Lee and Henry Adorna. 2012. Link prediction in a modified heterogeneous bibliographic network. In Proc. ASONAM. 442--449.
[40]
John Boaz Lee, Xiangnan Kong, Yihan Bao, and Constance Moore. 2017. Identifying deep contrasting networks from time series data: Application to brain network analysis. In Proc. SDM. 543--551.
[41]
John Boaz Lee, Ryan Rossi, and Xiangnan Kong. 2018. Graph classification using structural attention. In Proc. KDD. 1--9.
[42]
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proc. KDD. 631--636.
[43]
David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proc. CIKM. 556--559.
[44]
Qi Liu, Biao Xiang, Nicholas Jing Yuan, Enhong Chen, Hui Xiong, Yi Zheng, and Yu Yang. 2017. An influence propagation view of PageRank. ACM TKDD 11, 3 (2017), 30:1--30:30.
[45]
Xinyue Liu, Xiangnan Kong, Lei Liu, and Kuorong Chiang. 2018b. TreeGAN: Syntax-aware sequence generation with generative adversarial networks. In Proc. ICDM. 1140--1145.
[46]
Ye Liu, Lifang He, Bokai Cao, Philip S. Yu, Ann B. Ragin, and Alex D. Leow. 2018a. Multi-view multi-graph embedding for brain network clustering analysis. In Proc. AAAI. 117--124.
[47]
Zemin Liu, Vincent W. Zheng, Zhou Zhao, Hongxia Yang, Kevin Chen-Chuan Chang, Minghui Wu, and Jing Ying. 2018c. Subgraph-augmented path embedding for semantic user search on heterogeneous social network. In Proc. WWW. 1613--1622.
[48]
Qing Lu and Lise Getoor. 2003. Link-based classification. In Proc. ICML. 496--503.
[49]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proc. EMNLP. 1412--1421.
[50]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proc. KDD. 1903--1911.
[51]
Adam H. Marblestone, Greg Wayne, and Konrad P. Kording. 2016. Toward an integration of deep learning and neuroscience. Frontiers in computational neuroscience 10, 1 (2016), 94.
[52]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Proc. NeurIPS. 2204--2212.
[53]
Mathieu Moslonka-Lefebvre, Ann Finley, Ilaria Dorigatti, Katharina Dehnen-Schmutz, Tom Harwood, Michael J. Jeger, Xiangming Xu, Ottmar Holdenrieder, and Marco Pautasso. 2011. Networks in plant epidemiology: From genes to landscapes, countries, and continents. Phytopathology 101, 4 (2011), 392--403.
[54]
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proc. CVPR. 1717--1724.
[55]
Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proc. KDD. 228--238.
[56]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proc. KDD. 701--710.
[57]
Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han. 2017. An attention-based collaboration framework for multi-view network representation learning. In Proc. CIKM. 1767--1776.
[58]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. Role discovery in networks. ACM TKDE 27, 4 (2015), 1112--1131.
[59]
Ryan A. Rossi, Rong Zhou, and Nesreen K. Ahmed. 2018. Deep inductive network representation learning. In Proc. 3rd Int. Workshop on Learn. Represent. for Big Netw. (WWW BigNet’18). 8.
[60]
Seongok Ryu, Jaechang Lim, and Woo Youn Kim. 2018. Deeply learning molecular structure-property relationships using graph attention neural network. arXiv:1805.10988v2.
[61]
Chao Shang, Qinqing Liu, Ko-Shin Chen, Jiangwen Sun, Jin Lu, Jinfeng Yi, and Jinbo Bi. 2018. Edge attention-based multi-relational graph convolutional networks. arXiv:1802.04944v1.
[62]
Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. 2011. Weisfeiler-lehman graph kernels. JMLR 12, 1 (2011), 2539--2561.
[63]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A survey of heterogeneous information network analysis. IEEE TKDE 29, 1 (2017), 17--37.
[64]
Xiaoxiao Shi, Xiangnan Kong, and Philip S. Yu. 2012. Transfer significant subgraphs across graph databases. In Proc. SDM. 552--563.
[65]
Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, and Jiawei Han. 2011. Co-author relationship prediction in heterogeneous bibliographic networks. In Proc. ASONAM. 121--128.
[66]
Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009a. RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis. In Proc. EDBT. 565--576.
[67]
Yizhou Sun, Yintao Yu, and Jiawei Han. 2009b. Ranking-based clustering of heterogeneous information networks with star network schema. In Proc. KDD. 797--806.
[68]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In Proc. WWW. 1067--1077.
[69]
Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. 2018. Attention-based graph neural network for semi-supervised learning. arXiv:1803.03735v1.
[70]
Min-Hsuan Tsai, Charu C. Aggarwal, and Thomas S. Huang. 2014. Ranking in heterogeneous social media. In Proc. WSDM. 613--622.
[71]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In Proc. ICLR. 1--12.
[72]
S. V. N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt. 2010. Graph kernels. JMLR 11, 1 (2010), 1201--1242.
[73]
Yueyang Wang, Liang Hu, Yueting Zhuang, and Fei Wu. 2018. Intra-view and inter-view attention for multi-view network embedding. In Proc. PCM. 201--211.
[74]
Jia Wu, Zhibin Hong, Shirui Pan, Xingquan Zhu, Zhihua Cai, and Chengqi Zhang. 2015. Multi-graph-view subgraph mining for graph classification. KAIS 48, 1 (2015), 29--54.
[75]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proc. ICML. 2048--2057.
[76]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks? In Proc. ICLR. 1--17.
[77]
Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin. 2018. Graph2Seq: Graph to sequence learning with attention-based neural networks. arXiv:1804.00823v3.
[78]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proc. CVPR. 21--29.
[79]
Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Proc. NeurIPS. 4805--4815.
[80]
Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay S. Pande, and Jure Leskovec. 2018a. Graph convolutional policy network for goal-directed molecular graph generation. In Proc. NeurIPS. 6412--6422.
[81]
Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018b. GraphRNN: Generating realistic graphs with deep auto-regressive models. In Proc. ICML. 5694--5703.
[82]
Jingyuan Zhang, Bokai Cao, Sihong Xie, Chun-Ta Lu, Philip S. Yu, and Ann B. Ragin. 2016. Identifying connectivity patterns for brain diseases via multi-side-view guided deep architectures. In Proc. SDM. 36--44.
[83]
Jing Zhang, Bo Chen, Xianming Wang, Hong Chen, Cuiping Li, Fengmei Jin, Guojie Song, and Yutao Zhang. 2018. MEgo2Vec: Embedding matched ego networks for user alignment across social networks. In Proc. CIKM. 327--336.
[84]
Zhou Zhao, Ben Gao, Vicent W. Zheng, Deng Cai, Xiaofei He, and Yueting Zhuang. 2017. Link prediction via ranking metric dual-level attention network learning. In Proc. IJCAI. 3525--3531.
[85]
Weiguo Zheng, Lei Zou, Xiang Lian, Dong Wang, and Dongyan Zhao. 2013. Graph similarity search with edit distance constraint in large graph databases. In Proc. CIKM. 1595--1600.
[86]
Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Introduction to the special section on urban computing. ACM TIST 5, 3 (2014), 38:1--38:55.
[87]
Hao Zhou, Tom Yang, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018. Commonsense knowledge aware conversation generation with graph attention. In Proc. IJCAI-ECAI. 1--7.
[88]
Sheng Zhou, Jiajun Bu, Xin Wang, Jiawei Chen, Binbin Hu, Defang Chen, and Can Wang. 2019. HAHE: Hierarchical attentive heterogeneous information network embedding. arXiv:1902.01475v1.
[89]
Yin Zhu, Yuqiang Chen, Zhongqi Lu, Sinno Jialin Pan, Gui-Rong Xue, Yong Yu, and Qiang Yang. 2011. Heterogeneous transfer learning for image classification. In Proc. AAAI. 1304--1309.
[90]
Yuanyuan Zhu, Jeffrey Xu Yu, Hong Cheng, and Lu Qin. 2012. Graph classification: A diversified discriminative feature selection approach. In Proc. CIKM. 205--214.

Cited By

View all
  • (2025)ExGAT: Context extended graph attention neural networkNeural Networks10.1016/j.neunet.2024.106784181(106784)Online publication date: Jan-2025
  • (2025)GPNet: Simplifying graph neural networks via multi-channel geometric polynomialsInformation Sciences10.1016/j.ins.2024.121696694(121696)Online publication date: Mar-2025
  • (2025)Beyond Users: Denoising Behavior-based Contrastive Learning for Disentangled Cross-Domain RecommendationDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_11(163-178)Online publication date: 11-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 6
December 2019
282 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3366748
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2019
Accepted: 01 August 2019
Revised: 01 July 2019
Received: 01 July 2018
Published in TKDD Volume 13, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention mechanism
  2. deep learning
  3. graph attention
  4. graph attention survey

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)368
  • Downloads (Last 6 weeks)31
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ExGAT: Context extended graph attention neural networkNeural Networks10.1016/j.neunet.2024.106784181(106784)Online publication date: Jan-2025
  • (2025)GPNet: Simplifying graph neural networks via multi-channel geometric polynomialsInformation Sciences10.1016/j.ins.2024.121696694(121696)Online publication date: Mar-2025
  • (2025)Beyond Users: Denoising Behavior-based Contrastive Learning for Disentangled Cross-Domain RecommendationDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_11(163-178)Online publication date: 11-Jan-2025
  • (2024)Carbon emissions forecasting based on temporal graph transformer-based attentional neural networkJournal of Computational Methods in Sciences and Engineering10.3233/JCM-24713924:3(1405-1421)Online publication date: 17-Jun-2024
  • (2024)From graph convolution networks to graph scattering networks:a surveyJournal of Image and Graphics10.11834/jig.23006929:1(45-64)Online publication date: 2024
  • (2024)Foundations of spatial perception for roboticsInternational Journal of Robotics Research10.1177/0278364924122972543:10(1457-1505)Online publication date: 1-Sep-2024
  • (2024)SPORT: A Subgraph Perspective on Graph Classification with Label NoiseACM Transactions on Knowledge Discovery from Data10.1145/368746818:9(1-20)Online publication date: 6-Nov-2024
  • (2024)VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-modal Information RetrievalACM Transactions on Knowledge Discovery from Data10.1145/368680518:9(1-21)Online publication date: 18-Oct-2024
  • (2024)Learning Individual Treatment Effects under Heterogeneous Interference in NetworksACM Transactions on Knowledge Discovery from Data10.1145/367376118:8(1-21)Online publication date: 16-Aug-2024
  • (2024)DHyper: A Recurrent Dual Hypergraph Neural Network for Event Prediction in Temporal Knowledge GraphsACM Transactions on Information Systems10.1145/365301542:5(1-23)Online publication date: 29-Apr-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media