research-article

Rethinking Structural Encodings: Adaptive Graph Transformer for Node Classification Task

Authors:

Bo ZhengAuthors Info & Claims

WWW '23: Proceedings of the ACM Web Conference 2023

Pages 533 - 544

https://doi.org/10.1145/3543507.3583464

Published: 30 April 2023 Publication History

Abstract

Graph Transformers have proved their advantages in graph data mining with elaborate Positional Encodings, especially in graph-level tasks. However, their application in the node classification task has not been fully exploited yet. In the node classification task, existing Graph Transformers with Positional Encodings are limited by the following issues: (i) PEs describing the node’s positional identities are insufficient for the node classification task on complex graphs, where a full portrayal of the local node property is needed. (ii) PEs for graphs are integrated with Transformers in a constant schema, resulting in the ignorance of local patterns that may vary among different nodes. In this paper, we propose Adaptive Graph Transformer (AGT) to tackle above issues. AGT consists of a Learnable Centrality Encoding and a Kernelized Local Structure Encoding. The two modules extract structural patterns from centrality and subgraph views in a learnable and scalable manner. Further, we design the Adaptive Transformer Block to adaptively integrate the attention scores and Structural Encodings in a node-specific manner. AGT achieves state-of-the-art performances on nine real-world web graphs (up to 1.6 million nodes). Furthermore, AGT shows outstanding results on two series of synthetic graphs with ranges of heterophily and noise ratios.

References

[1]

Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, and Aram Galstyan. 2019. MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA(Proceedings of Machine Learning Research, Vol. 97). PMLR, 21–29.

[2]

Anson Bastos, Abhishek Nadgeri, Kuldeep Singh, Hiroki Kanezashi, Toyotaro Suzumura, and Isaiah Onando Mulang’. 2022. How Expressive are Transformers in Spectral Domain for Graphs¿Transactions on Machine Learning Research (2022). https://openreview.net/forum¿id=aRsLetumx1

[3]

Rickard Brüel-Gabrielsson, Mikhail Yurochkin, and Justin Solomon. 2022. Rewiring with positional encodings for graph neural networks. arXiv preprint arXiv:2201.12674 (2022).

[4]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, Proceedings(Lecture Notes in Computer Science, Vol. 12346). Springer, 213–229.

[5]

Ciwan Ceylan, Petra Poklukar, Hanna Hultin, Alexander Kravchenko, Anastasia Varava, and Danica Kragic. 2022. GraphDCA–a Framework for Node Distribution Comparison in Real and Synthetic Graphs. arXiv preprint arXiv:2202.03884 (2022).

[6]

Cong Chen, Chaofan Tao, and Ngai Wong. 2021. LiteGT: Efficient and Lightweight Graph Transformers. In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021. ACM, 161–170. https://doi.org/10.1145/3459637.3482272

Digital Library

[7]

Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3438–3445.

[8]

Dexiong Chen, Leslie O’Bray, and Karsten Borgwardt. 2022. Structure-aware transformer for graph representation learning. In International Conference on Machine Learning. PMLR, 3469–3489.

[9]

Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. 2021. Adaptive Universal Generalized PageRank Graph Neural Network. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.

[10]

Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In International conference on machine learning. PMLR, 933–941.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.

[12]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, Virtual Event, Austria, May 3-7, 2021.

[13]

Vijay Prakash Dwivedi and Xavier Bresson. 2021. A Generalization of Transformer Networks to Graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021).

[14]

Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2021. Graph neural networks with learnable structural and positional representations. arXiv preprint arXiv:2110.07875 (2021).

[15]

James Fox and Sivasankaran Rajamanickam. 2019. How Robust Are Graph Neural Networks to Structural Noise¿arXiv preprint arXiv:1912.10206 (2019).

[16]

Thomas Gärtner, Peter Flach, and Stefan Wrobel. 2003. On graph kernels: Hardness results and efficient alternatives. In Learning theory and kernel machines. Springer, 129–143.

[17]

David F Gleich. 2015. PageRank beyond the Web. siam REVIEW 57, 3 (2015), 321–363.

[18]

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118–22133.

[19]

Weizhe Hua, Zihang Dai, Hanxiao Liu, and Quoc Le. 2022. Transformer quality in linear time. In International Conference on Machine Learning. PMLR, 9099–9117.

[20]

Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph Structure Learning for Robust Graph Neural Networks. In KDD. ACM, 66–74.

[21]

Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph structure learning for robust graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 66–74.

Digital Library

[22]

U Kang, Hanghang Tong, and Jimeng Sun. 2012. Fast random walk graph kernel. In Proceedings of the 2012 SIAM international conference on data mining. SIAM, 828–838.

[23]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).

[24]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations.

[25]

Devin Kreuzer, Dominique Beaini, Will Hamilton, Vincent Létourneau, and Prudencio Tossou. 2021. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems 34 (2021).

[26]

Weirui Kuang, Zhen WANG, Yaliang Li, Zhewei Wei, and Bolin Ding. 2022. Coarformer: Transformer for large graph via graph coarsening.

[27]

Amy N Langville and Carl D Meyer. 2005. A survey of eigenvector methods for web information retrieval. SIAM review 47, 1 (2005), 135–161.

[28]

Pan Li, Yanbang Wang, Hongwei Wang, and Jure Leskovec. 2020. Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. In Advances in Neural Information Processing Systems, Vol. 33. 4465–4478.

[29]

Derek Lim, Felix Hohne, Xiuyu Li, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, and Ser Nam Lim. 2021. Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods. Advances in Neural Information Processing Systems 34 (2021).

[30]

Xiaojun Ma, Qin Chen, Yuanyi Ren, Guojie Song, and Liang Wang. 2022. Meta-Weight Graph Neural Network: Push the Limits Beyond Global Homophily. In Proceedings of the ACM Web Conference 2022. 1270–1280.

Digital Library

[31]

Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.

[32]

Grégoire Mialon, Dexiong Chen, Margot Selosse, and Julien Mairal. 2021. GraphiT: Encoding Graph Structure in Transformers. CoRR abs/2106.05667 (2021). arXiv:2106.05667

[33]

Grégoire Mialon, Dexiong Chen, Margot Selosse, and Julien Mairal. 2021. Graphit: Encoding graph structure in transformers. arXiv preprint arXiv:2106.05667 (2021).

[34]

Giannis Nikolentzos and Michalis Vazirgiannis. 2020. Random walk graph neural networks. Advances in Neural Information Processing Systems 33 (2020), 16211–16222.

[35]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford InfoLab.

[36]

Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-GCN: Geometric Graph Convolutional Networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.

[37]

Ladislav Rampášek, Mikhail Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. 2022. Recipe for a General, Powerful, Scalable Graph Transformer. arXiv preprint arXiv:2205.12454 (2022).

[38]

Benedek Rozemberczki, Carl Allen, and Rik Sarkar. 2021. Multi-scale attributed node embedding. Journal of Complex Networks 9, 2 (2021), cnab014.

[39]

Benedek Rozemberczki, Carl Allen, and Rik Sarkar. 2021. Multi-Scale attributed node embedding. J. Complex Networks 9, 2 (2021). https://doi.org/10.1093/comnet/cnab014

[40]

Britta Ruhnau. 2000. Eigenvector-centrality - a node-centrality¿Soc. Networks 22, 4 (2000), 357–365.

[41]

Aravind Sankar, Junting Wang, Adit Krishnan, and Hari Sundaram. 2020. Beyond localized graph neural networks: An attributed motif regularization framework. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 472–481.

[42]

Jari Saramäki, Mikko Kivelä, Jukka-Pekka Onnela, Kimmo Kaski, and Janos Kertesz. 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review E 75, 2 (2007), 027105.

[43]

Kathrin Schacke. 2004. On the kronecker product. Master’s thesis, University of Waterloo (2004).

[44]

Christian M Schneider, Nuri Yazdani, Nuno AM Araújo, Shlomo Havlin, and Hans J Herrmann. 2013. Towards designing robust coupled networks. Scientific reports 3, 1 (2013), 1–7.

[45]

Vasilis Stavrou and Dimitris Gritzalis. 2015. Introduction to Social Media Investigation - A hands-on Approach, Jennifer Golbeck, Elsevier Publications, USA (2015). Comput. Secur. 55 (2015), 128–129.

Digital Library

[46]

Mahito Sugiyama and Karsten Borgwardt. 2015. Halting in random walk kernels. Advances in neural information processing systems 28 (2015).

[47]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, 18-24 July 2021, Virtual Event(Proceedings of Machine Learning Research, Vol. 139). 10347–10357.

[48]

Charles F Van Loan. 2000. The ubiquitous Kronecker product. Journal of computational and applied mathematics 123, 1-2 (2000), 85–100.

[49]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.

[50]

S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels. Journal of Machine Learning Research 11 (2010), 1201–1242.

[51]

Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui, Beixing Deng, and Xing Li. 2011. Understanding graph sampling algorithms for social network analysis. In 2011 31st international conference on distributed computing systems workshops. IEEE, 123–128.

Digital Library

[52]

Xiao Wang, Meiqi Zhu, Deyu Bo, Peng Cui, Chuan Shi, and Jian Pei. 2020. AM-GCN: Adaptive Multi-channel Graph Convolutional Networks. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020. ACM, 1243–1253.

Digital Library

[53]

Boris Weisfeiler and Andrei Leman. 1968. The reduction of a graph to canonical form and the algebra which appears therein. NTI, Series 2, 9 (1968), 12–16.

[54]

Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019. Simplifying Graph Convolutional Networks. In ICML(Proceedings of Machine Learning Research, Vol. 97). PMLR, 6861–6871.

[55]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks¿. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.

[56]

Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016(JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 40–48.

[57]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.

[58]

Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do Transformers Really Perform Bad for Graph Representation¿. In Thirty-Fifth Conference on Neural Information Processing Systems.

[59]

Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J. Kim. 2019. Graph Transformer Networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019. 11960–11970.

[60]

Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, and Ren Chen. 2021. Decoupling the Depth and Scope of Graph Neural Networks. Advances in Neural Information Processing Systems 34 (2021).

[61]

Justin Zhan, Sweta Gurung, and Sai Phani Krishna Parsa. 2017. Identification of top-k nodes in large networks using katz centrality. Journal of Big Data 4, 1 (2017), 1–19.

[62]

Zaixi Zhang, Qi Liu, Hao Wang, Chengqiang Lu, and Chee-Kong Lee. 2021. Motif-based Graph Self-Supervised Learning for Molecular Property Prediction. Advances in Neural Information Processing Systems 34 (2021).

[63]

Jianan Zhao, Chaozhuo Li, Qianlong Wen, Yiqi Wang, Yuming Liu, Hao Sun, Xing Xie, and Yanfang Ye. 2021. Gophormer: Ego-Graph Transformer for Node Classification. arXiv preprint arXiv:2110.13094 (2021).

[64]

Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. In NeurIPS.

Cited By

Nie MChen DChen HWang D(2025)AutoMTNAS: Automated meta-reinforcement learning on graph tokenization for graph neural architecture searchKnowledge-Based Systems10.1016/j.knosys.2025.113023310(113023)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113023
Zhang YMa XWu JYang JFan HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Heterogeneous Subgraph Transformer for Fake News DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645680(1272-1282)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645680
Li XMa JWu ZSu DZhang WLi RWang GChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Rethinking Node-wise Propagation for Large-scale Graph LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645450(560-569)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645450
Show More Cited By

Index Terms

Rethinking Structural Encodings: Adaptive Graph Transformer for Node Classification Task
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Balancing structure and position information in Graph Transformer network with a learnable node embedding
Abstract
The Transformer-based graph neural network models have achieved remarkable results in graph representation learning in recent years. One of the main challenges in graph representation learning with Transformer architecture is the non-existence of ...
Highlights
- Graph Transformer Networks need both structural and positional encoding.
- Propose a lightweight and robust node positional encoding.
- Propose an adaptive learning for structural information.
- Unify structure with positional ...
Mixup for Node and Graph Classification
WWW '21: Proceedings of the Web Conference 2021

Mixup is an advanced data augmentation method for training neural network based image classifiers, which interpolates both features and labels of a pair of images to produce synthetic samples. However, devising the Mixup methods for graph learning is ...
Node Classification of Graph Neural Networks Based on Graph Degree-Symmetry
AAIA '23: Proceedings of the 2023 International Conference on Advances in Artificial Intelligence and Applications

In order to accurately reflect the information of each node's neighbours and the topological relationship of the graph structure, the graph neural network node classification task aims to treat each node and all of its neighbouring nodes as a subgraph ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '23: Proceedings of the ACM Web Conference 2023

April 2023

4293 pages

ISBN:9781450394161

DOI:10.1145/3543507

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
883
Total Downloads

Downloads (Last 12 months)358
Downloads (Last 6 weeks)28

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nie MChen DChen HWang D(2025)AutoMTNAS: Automated meta-reinforcement learning on graph tokenization for graph neural architecture searchKnowledge-Based Systems10.1016/j.knosys.2025.113023310(113023)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113023
Zhang YMa XWu JYang JFan HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Heterogeneous Subgraph Transformer for Fake News DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645680(1272-1282)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645680
Li XMa JWu ZSu DZhang WLi RWang GChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Rethinking Node-wise Propagation for Large-scale Graph LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645450(560-569)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645450
Yang XShu J(2024)A Node Importance Evaluation Method Based on Graph-Transformer2024 8th International Conference on Communication and Information Systems (ICCIS)10.1109/ICCIS63642.2024.10779437(197-202)Online publication date: 18-Oct-2024
https://doi.org/10.1109/ICCIS63642.2024.10779437
Ju WFang ZGu YLiu ZLong QQiao ZQin YShen JSun FXiao ZYang JYuan JZhao YWang YLuo XZhang M(2024)A Comprehensive Survey on Deep Graph Representation LearningNeural Networks10.1016/j.neunet.2024.106207173(106207)Online publication date: May-2024
https://doi.org/10.1016/j.neunet.2024.106207
Shi BWang YGuo FShao JShen HCheng XFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Improving Graph Domain Adaptation with Network HierarchyProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614928(2249-2258)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614928

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents