skip to main content
research-article

Enhancing Out-of-distribution Generalization on Graphs via Causal Attention Learning

Published: 26 March 2024 Publication History

Abstract

In graph classification, attention- and pooling-based graph neural networks (GNNs) predominate to extract salient features from the input graph and support the prediction. They mostly follow the paradigm of “learning to attend,” which maximizes the mutual information between the attended graph and the ground-truth label. However, this paradigm causes GNN classifiers to indiscriminately absorb all statistical correlations between input features and labels in the training data without distinguishing the causal and noncausal effects of features. Rather than emphasizing causal features, the attended graphs tend to rely on noncausal features as shortcuts to predictions. These shortcut features may easily change outside the training distribution, thereby leading to poor generalization for GNN classifiers. In this article, we take a causal view on GNN modeling. Under our causal assumption, the shortcut feature serves as a confounder between the causal feature and prediction. It misleads the classifier into learning spurious correlations that facilitate prediction in in-distribution (ID) test evaluation while causing significant performance drop in out-of-distribution (OOD) test data. To address this issue, we employ the backdoor adjustment from causal theory—combining each causal feature with various shortcut features, to identify causal patterns and mitigate the confounding effect. Specifically, we employ attention modules to estimate the causal and shortcut features of the input graph. Then, a memory bank collects the estimated shortcut features, enhancing the diversity of shortcut features for combination. Simultaneously, we apply the prototype strategy to improve the consistency of intra-class causal features. We term our method as CAL+, which can promote stable relationships between causal estimation and prediction, regardless of distribution changes. Extensive experiments on synthetic and real-world OOD benchmarks demonstrate our method’s effectiveness in improving OOD generalization. Our codes are released at https://github.com/shuyao-wang/CAL-plus.

References

[1]
Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11 (2012), 2274–2282.
[2]
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).
[3]
Beatrice Bevilacqua, Yangze Zhou, and Bruno Ribeiro. 2021. Size-invariant graph representations for graph classification extrapolations. In ICML. PMLR, 837–851.
[4]
Shaked Brody, Uri Alon, and Eran Yahav. 2022. How attentive are graph attention networks? In ICLR.
[5]
Davide Buffelli, Pietro Lio, and Fabio Vandin. 2022. SizeShiftReg: A regularization method for improving size-generalization in graph neural networks. In NeurIPS.
[6]
Yongqiang Chen, Yatao Bian, Kaiwen Zhou, Binghui Xie, Bo Han, and James Cheng. 2023. Does invariant graph learning via environment augmentation learn invariance? In NeurIPS. Retrieved from https://openreview.net/forum?id=EqpR9Vtt13
[7]
Yongqiang Chen, Yatao Bian, Kaiwen Zhou, Binghui Xie, Bo Han, and James Cheng. 2023. Rethinking invariant graph representation learning without environment partitions In ICML DG Workshop.
[8]
Yongqiang Chen, Yonggang Zhang, Yatao Bian, Han Yang, M. A. Kaili, Binghui Xie, Tongliang Liu, Bo Han, and James Cheng. 2022. Learning causally invariant representations for out-of-distribution generalization on graphs. Neural Inf. Process. 35 (2022), 22131–22148.
[9]
Asim Kumar Debnath, Rosa L. Lopez de Compadre, Gargi Debnath, Alan J. Shusterman, and Corwin Hansch. 1991. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Medicin. Chem. 34, 2 (1991).
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL. 4171–4186.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.
[12]
Vijay Prakash Dwivedi, Chaitanya K Joshi, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2020. Benchmarking graph neural networks. arXiv preprint arXiv:2003.00982 (2020).
[13]
Shaohua Fan, Xiao Wang, Yanhu Mo, Chuan Shi, and Jian Tang. 2022. Debiasing graph neural networks via learning disentangled causal substructure. In NeurIPS.
[14]
Shaohua Fan, Xiao Wang, Chuan Shi, Peng Cui, and Bai Wang. 2021. Generalizing graph neural networks on out-of-distribution graphs. arXiv preprint arXiv:2111.10657 (2021).
[15]
Junfeng Fang, Wei Liu, Yuan Gao, Zemin Liu, An Zhang, Xiang Wang, and Xiangnan He. 2023. Evaluating post-hoc explanations for graph neural networks via robustness analysis. In NeurIPS.
[16]
Junfeng Fang, Xiang Wang, An Zhang, Zemin Liu, Xiangnan He, and Tat-Seng Chua. 2023. Cooperative explanations of graph neural networks. In WSDM. ACM, 616–624.
[17]
Fuli Feng, Weiran Huang, Xiangnan He, Xin Xin, Qifan Wang, and Tat-Seng Chua. 2021. Should graph convolution trust neighbors? A simple causal inference method. In SIGIR. 1208–1218.
[18]
Hongyang Gao and Shuiwang Ji. 2019. Graph U-Nets. In ICML. 2083–2092.
[19]
Yuan Gao, Xiang Wang, Xiangnan He, Huamin Feng, and Yong-Dong Zhang. 2023. Rumor detection with self-supervised learning on texts and social graph. Front. Comput. Sci. 17, 4 (2023), 174611.
[20]
Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, and Yongdong Zhang. 2023. Addressing heterophily in graph anomaly detection: A perspective of graph spectrum. In WWW. ACM, 1528–1538.
[21]
Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, and Yongdong Zhang. 2023. Alleviating structural distribution shift in graph anomaly detection. In WSDM. ACM, 357–365.
[22]
Shurui Gui, Xiner Li, Limei Wang, and Shuiwang Ji. 2022. Good: A graph out-of-distribution benchmark. In NeurIPS.
[23]
Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. 2022. G-mixup: Graph data augmentation for graph classification. In ICML. PMLR, 8230–8248.
[24]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729–9738.
[25]
Dan Hendrycks and Kevin Gimpel. 2017. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR.
[26]
Miguel A. Hernan and James M. Robins. 2010. Causal Inference: What If. CRC Press.
[27]
Paul W. Holland. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81, 396 (1986), 945–960.
[28]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR.
[29]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Neural Inf. Process. 33 (2020), 22118–22133.
[30]
Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, and Hanwang Zhang. 2021. Distilling causal effect of data in class-incremental learning. In CVPR.
[31]
Wei Jin, Tong Zhao, Jiayuan Ding, Yozen Liu, Jiliang Tang, and Neil Shah. 2023. Empowering graph representation learning with test-time graph transformation. In ICLR.
[32]
Dongkwan Kim and Alice Oh. 2020. How to find your friendly neighborhood: Graph attention design with self-supervision. In ICLR.
[33]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.
[34]
Boris Knyazev, Graham W. Taylor, and Mohamed R. Amer. 2019. Understanding attention and generalization in graph neural networks. In NeurIPS. 4204–4214.
[35]
Kezhi Kong, Guohao Li, Mucong Ding, Zuxuan Wu, Chen Zhu, Bernard Ghanem, Gavin Taylor, and Tom Goldstein. 2022. Robust optimization as data augmentation for large-scale graphs. In CVPR. 60–69.
[36]
David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. 2021. Out-of-distribution generalization via risk extrapolation (REX). In ICML. PMLR, 5815–5826.
[37]
Junhyun Lee, Inyeop Lee, and Jaewoo Kang. 2019. Self-attention graph pooling. In ICML. 3734–3743.
[38]
John Boaz Lee, Ryan Rossi, and Xiangnan Kong. 2018. Graph classification using structural attention. In KDD. 1666–1674.
[39]
John Boaz Lee, Ryan A. Rossi, Xiangnan Kong, Sungchul Kim, Eunyee Koh, and Anup Rao. 2019. Graph convolutional networks with motif-based attention. In CIKM. 499–508.
[40]
Haoyang Li, Xin Wang, Ziwei Zhang, and Wenwu Zhu. 2023. OOD-GNN: Out-of-distribution generalized graph neural network. IEEE Transactions on Knowledge and Data Engineering 35, 7 (2023), 7328–7340. DOI:
[41]
Haoyang Li, Xin Wang, Ziwei Zhang, and Wenwu Zhu. 2022. Out-of-distribution generalization on graphs: A survey. arXiv preprint arXiv:2202.07987 (2022).
[42]
Haoyang Li, Ziwei Zhang, Xin Wang, and Wenwu Zhu. 2022. Learning invariant graph representations for out-of-distribution generalization. In NeurIPS.
[43]
Haoyang Li, Ziwei Zhang, Xin Wang, and Wenwu Zhu. 2023. Invariant node representation learning under distribution shifts with multiple latent environments. ACM Trans. Inf. Syst. 42, 1 (2023), 1–30.
[44]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated graph sequence neural networks. In ICLR.
[45]
Wanyu Lin, Hao Lan, and Baochun Li. 2021. Generative causal explanations for graph neural networks. In ICML. PMLR, 6666–6679.
[46]
Yong Lin, Shengyu Zhu, Lu Tan, and Peng Cui. 2022. ZIN: When and how to learn invariance without environment partition? Neural Inf. Process 35 (2022), 24529–24542.
[47]
Gang Liu, Tong Zhao, Jiaxin Xu, Tengfei Luo, and Meng Jiang. 2022. Graph rationalization with environment-based augmentations. In KDD. 1069–1078.
[48]
Divyat Mahajan, Shruti Tople, and Amit Sharma. 2021. Domain generalization using causal matching. In ICML. PMLR, 7313–7324.
[49]
Siqi Miao, Mia Liu, and Pan Li. 2022. Interpretable and generalizable graph learning via stochastic attention mechanism. In ICML. PMLR, 15524–15543.
[50]
Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. Tudataset: A collection of benchmark datasets for learning with graphs. In ICMLW.
[51]
Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, and Ji-Rong Wen. 2021. Counterfactual VQA: A cause-effect look at language bias. In CVPR. 12700–12710.
[52]
Judea Pearl. 2010. Causal inference. Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 (PMLR), Proceedings of Machine Learning Research, Vol. 6, 39–58.
[53]
Judea Pearl. 2014. Interpretation and identification of causal mediation. Psychol. Meth. 19, 4 (2014), 459.
[54]
Judea Pearl. 2000. Models, reasoning and inference. Cambridge, UK: Cambridge University Press 19 (2000).
[55]
Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.
[56]
Qi Qi, Jiameng Lyu, Kung sik Chan, Er Wei Bai, and Tianbao Yang. 2022. Stochastic constrained DRO with a complexity independent of sample size. arXiv preprint arXiv:2210.05740 (2022).
[57]
Qi Qi, Yi Xu, Rong Jin, Wotao Yin, and Tianbao Yang. 2020. Attentional biased stochastic gradient for imbalanced classification. arXiv preprint arXiv:2012.06951 (2020).
[58]
Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2019. DropEdge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019).
[59]
Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. 2020. The risks of invariant risk minimization. In ICLR.
[60]
Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In ICLR.
[61]
Yongduo Sui, Tianlong Chen, Pengfei Xia, Shuyao Wang, and Bin Li. 2022. Towards robust detection and segmentation using vertical and horizontal adversarial training. In IJCNN. IEEE, 1–8.
[62]
Yongduo Sui, Xiang Wang, Tianlong Chen, Meng Wang, Xiangnan He, and Tat-Seng Chua. 2023. Inductive lottery ticket learning for graph neural networks. J. Comput. Sci. Technol. (2023).
[63]
Yongduo Sui, Xiang Wang, Jiancan Wu, Min Lin, Xiangnan He, and Tat-Seng Chua. 2022. Causal attention for interpretable and generalizable graph classification. In KDD. 1696–1705.
[64]
Yongduo Sui, Qitian Wu, Jiancan Wu, Qing Cui, Longfei Li, Jun Zhou, Xiang Wang, and Xiangnan He. 2023. Unleashing the power of graph data augmentation on covariate distribution shift. In NeurIPS.
[65]
Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. 2020. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In NeurIPS.
[66]
Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. 2018. Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735 (2018).
[67]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998–6008.
[68]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.
[69]
Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. 2021. Causal attention for unbiased visual recognition. In CVPR. 3091–3100.
[70]
Xiang Wang, Yingxin Wu, An Zhang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. 2023. Reinforced causal explainer for graph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 2 (2023), 2297–2309. DOI:
[71]
Xiang Wang, Yingxin Wu, An Zhang, Xiangnan He, and Tat seng Chua. 2021. Towards multi-grained explainability for graph neural networks. In NeurIPS.
[72]
Yiqi Wang, Yao Ma, Wei Jin, Chaozhuo Li, Charu Aggarwal, and Jiliang Tang. 2020. Customized graph neural networks. arXiv preprint arXiv:2005.12386 (2020).
[73]
Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, and Bryan Hooi. 2021. Mixup for node and graph classification. In WWW. 3663–3674.
[74]
Qitian Wu, Hengrui Zhang, Junchi Yan, and David Wipf. 2022. Handling distribution shifts on graphs: An invariance perspective. In ICLR.
[75]
Yingxin Wu, Xiang Wang, An Zhang, Xiangnan He, and Tat-Seng Chua. 2022. Discovering invariant rationales for graph neural networks. In ICLR.
[76]
Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. 2018. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 9, 2 (2018), 513–530.
[77]
Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR. 3733–3742.
[78]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML.
[79]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks? In ICLR.
[80]
Nianzu Yang, Kaipeng Zeng, Qitian Wu, Xiaosong Jia, and Junchi Yan. 2022. Learning substructure invariance for out-of-distribution molecular representations. In NeurIPS.
[81]
Xu Yang, Hanwang Zhang, Guojun Qi, and Jianfei Cai. 2021. Causal attention for vision-language tasks. In CVPR. 9847–9857.
[82]
Gilad Yehudai, Ethan Fetaya, Eli Meirom, Gal Chechik, and Haggai Maron. 2021. From local structures to size generalization in graph neural networks. In ICML. PMLR, 11975–11986.
[83]
Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating explanations for graph neural networks. In NeurIPS. 9240–9251.
[84]
Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In NeurIPS. 4805–4815.
[85]
Junchi Yu, Jian Liang, and Ran He. 2023. Mind the label shift of augmentation-based graph OOD generalization. In CVPR.
[86]
Hao Yuan, Jiliang Tang, Xia Hu, and Shuiwang Ji. 2020. XGNN: Towards model-level explanations of graph neural networks. In KDD. 430–438.
[87]
Matej Zečević, Devendra Singh Dhami, Petar Veličković, and Kristian Kersting. 2021. Relating graph neural networks to structural causal models. arXiv preprint arXiv:2109.04173 (2021).
[88]
Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, and Qianru Sun. 2020. Causal intervention for weakly-supervised semantic segmentation. In NeurIPS.
[89]
Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In AAAI.
[90]
Michael Zhang, Nimit S. Sohoni, Hongyang R. Zhang, Chelsea Finn, and Christopher Re. 2022. Correct-N-contrast: A contrastive approach for improving robustness to spurious correlations. In ICML. PMLR, 26484–26516.
[91]
Xingxuan Zhang, Peng Cui, Renzhe Xu, Linjun Zhou, Yue He, and Zheyan Shen. 2021. Deep stable learning for out-of-distribution generalization. In CVPR. 5372–5382.
[92]
Qi Zhu, Natalia Ponomareva, Jiawei Han, and Bryan Perozzi. 2021. Shift-robust GNNS: Overcoming the limitations of localized graph training data. Neural Inf. Process. 34 (2021), 27965–27977.

Cited By

View all
  • (2025)A Simple Data Augmentation for Graph Classification: A Perspective of Equivariance and InvarianceACM Transactions on Knowledge Discovery from Data10.1145/370606219:2(1-24)Online publication date: 14-Feb-2025
  • (2025)Casual inference-enabled graph neural networks for generalized fault diagnosis in industrial IoT systemInformation Sciences10.1016/j.ins.2024.121719694(121719)Online publication date: Mar-2025
  • (2024)Causal Invariant Hierarchical Molecular Representation for Out-of-distribution Molecular Property Prediction2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM62325.2024.10822583(759-766)Online publication date: 3-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 5
June 2024
699 pages
EISSN:1556-472X
DOI:10.1145/3613659
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Online AM: 05 February 2024
Accepted: 16 January 2024
Revised: 19 November 2023
Received: 07 June 2023
Published in TKDD Volume 18, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph learning
  2. attention mechanism
  3. out-of-distribution generalization

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)747
  • Downloads (Last 6 weeks)88
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Simple Data Augmentation for Graph Classification: A Perspective of Equivariance and InvarianceACM Transactions on Knowledge Discovery from Data10.1145/370606219:2(1-24)Online publication date: 14-Feb-2025
  • (2025)Casual inference-enabled graph neural networks for generalized fault diagnosis in industrial IoT systemInformation Sciences10.1016/j.ins.2024.121719694(121719)Online publication date: Mar-2025
  • (2024)Causal Invariant Hierarchical Molecular Representation for Out-of-distribution Molecular Property Prediction2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM62325.2024.10822583(759-766)Online publication date: 3-Dec-2024
  • (2024)A survey of out‐of‐distribution generalization for graph machine learning from a causal viewAI Magazine10.1002/aaai.1220245:4(537-548)Online publication date: 13-Dec-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media