research-article

NodeAug: Semi-Supervised Node Classification with Data Augmentation

Authors:

Bryan HooiAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 207 - 217

https://doi.org/10.1145/3394486.3403063

Published: 20 August 2020 Publication History

Abstract

By using Data Augmentation (DA), we present a new method to enhance Graph Convolutional Networks (GCNs), that are the state-of-the-art models for semi-supervised node classification. DA for graph data remains under-explored. Due to the connections built by edges, DA for different nodes influence each other and lead to undesired results, such as uncontrollable DA magnitudes and changes of ground-truth labels. To address this issue, we present the NodeAug (Node-Parallel Augmentation) scheme, that creates a 'parallel universe' for each node to conduct DA, to block the undesired effects from other nodes. NodeAug regularizes the model prediction of every node (including unlabeled) to be invariant with respect to changes induced by Data Augmentation (DA), so as to improve the effectiveness. To augment the input features from different aspects, we propose three DA strategies by modifying both node attributes and the graph structure. In addition, we introduce the subgraph mini-batch training for the efficient implementation of NodeAug. The approach takes the subgraph corresponding to the receptive fields of a batch of nodes as the input per iteration, rather than the whole graph that the prior full-batch training takes. Empirically, NodeAug yields significant gains for strong GCN models on the Cora, Citeseer, Pubmed, and two co-authorship networks, with a more efficient training process thanks to the proposed subgraph mini-batch training approach.

Supplementary Material

MP4 File (3394486.3403063.mp4)

This video is for presenting our paper: 'NodeAug: Semi-Supervised Node Classification with Data Augmentation'.

Download
38.47 MB

References

[1]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems. 5050--5060.

[2]

Marcus D Bloice, Christof Stocker, and Andreas Holzinger. 2017. Augmentor: An Image Augmentation Library for Machine Learning. arXiv preprint arXiv:1708.04680 (2017).

[3]

Andreas Buja, Dianne Cook, and Deborah F Swayne. 1996. Interactive high-dimensional data visualization. Journal of computational and graphical statistics, Vol. 5, 1 (1996), 78--99.

[4]

Aydin Bulucc and Kamesh Madduri. 2011. Parallel breadth-first search on distributed memory systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.

Digital Library

[5]

Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In AISTATS, Vol. 2005. Citeseer, 57--64.

[6]

Kevin Clark, Minh-Thang Luong, Christopher D Manning, and Quoc V Le. 2018. Semi-supervised sequence modeling with cross-view training. arXiv preprint arXiv:1809.08370 (2018).

[7]

Zhijie Deng, Yinpeng Dong, and Jun Zhu. 2019. Batch virtual adversarial training for graph convolutional networks. arXiv preprint arXiv:1902.09192 (2019).

[8]

Terrance DeVries and Graham W Taylor. 2017. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv preprint arXiv:1708.04552 (2017).

[9]

Ming Ding, Jie Tang, and Jie Zhang. 2018. Semi-supervised learning on graphs with generative adversarial nets. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 913--922.

Digital Library

[10]

Holger Ebel, Lutz-Ingo Mielsch, and Stefan Bornholdt. 2002. Scale-free topology of e-mail networks. Physical review E, Vol. 66, 3 (2002), 035103.

[11]

Fuli Feng, Xiangnan He, Jie Tang, and Tat-Seng Chua. 2019. Graph adversarial training: Dynamically regularizing based on graph structure. IEEE Transactions on Knowledge and Data Engineering (2019).

[12]

Hongyang Gao and Shuiwang Ji. 2019. Graph U-Nets. arXiv preprint arXiv:1905.05178 (2019).

[13]

Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-Scale Learnable Graph Convolutional Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1416--1424.

Digital Library

[14]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning .MIT press.

Digital Library

[15]

James M Joyce. 2011. Kullback-leibler divergence. International encyclopedia of statistical science (2011), 720--722.

[16]

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016).

[17]

Thomas N Kipf and Max Welling. 2016. Semi-supervised Classification With Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016).

[18]

Samuli Laine and Timo Aila. 2016. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016).

[19]

Ben London and Lise Getoor. 2014. Collective Classification of Network Data. Data Classification: Algorithms and Applications, Vol. 399 (2014).

[20]

Lijuan Luo, Martin Wong, and Wen-mei Hwu. 2010. An effective GPU implementation of breadth-first search. In Design Automation Conference. IEEE, 52--55.

Digital Library

[21]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 8 (2018), 1979--1993.

[22]

Meng Qu, Yoshua Bengio, and Jian Tang. 2019. Gmnn: Graph markov neural networks. arXiv preprint arXiv:1905.06214 (2019).

[23]

Pei Quan, Yong Shi, Minglong Lei, Jiaxu Leng, Tianlin Zhang, and Lingfeng Niu. 2019. A Brief Review of Receptive Fields in Graph Convolutional Networks. In IEEE/WIC/ACM International Conference on Web Intelligence-Volume 24800. ACM, 106--110.

Digital Library

[24]

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018).

[25]

Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, and Yong Jae Lee. 2018. Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond. arXiv preprint arXiv:1811.02545 (2018).

[26]

Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph Attention Networks. arXiv preprint arXiv:1710.10903 (2017).

[27]

Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. 2019 a. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825 (2019).

[28]

Vikas Verma, Meng Qu, Alex Lamb, Yoshua Bengio, Juho Kannala, and Jian Tang. 2019 b. GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning. arXiv preprint arXiv:1909.11715 (2019).

[29]

Xiaoyun Wang, Joe Eaton, Cho-Jui Hsieh, and Felix Wu. 2018. Attack Graph Convolutional Networks by Adding Fake Nodes. arXiv preprint arXiv:1810.10751 (2018).

[30]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2019. A Comprehensive Survey on Graph Neural Networks. arXiv preprint arXiv:1901.00596 (2019).

[31]

Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised Data Augmentation. arXiv preprint arXiv:1904.12848 (2019).

[32]

Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2016. Revisiting semi-supervised learning with graph embeddings. arXiv preprint arXiv:1603.08861 (2016).

Digital Library

[33]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).

[34]

Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2017. Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017).

Cited By

Ma XLiu FWu JYang JXue SSheng Q(2025)Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and ObjectivesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350130737:2(881-895)Online publication date: Feb-2025
https://doi.org/10.1109/TKDE.2024.3501307
Guo YBo DYang CLu ZZhang ZLiu JPeng YShi C(2025)Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
https://doi.org/10.1109/TBDATA.2024.3489412
Gao YZhao QYang LYang JYang J(2025)Tensor Representation-Based Multiview Graph Contrastive Learning for IoE IntelligenceIEEE Internet of Things Journal10.1109/JIOT.2024.341561212:4(3482-3492)Online publication date: 15-Feb-2025
https://doi.org/10.1109/JIOT.2024.3415612
Show More Cited By

Index Terms

NodeAug: Semi-Supervised Node Classification with Data Augmentation
1. Computing methodologies
  1. Machine learning

Recommendations

Modulation classification with data augmentation based on a semi-supervised generative model
Abstract
Although modulation classification with deep learning has been widely explored, this is challenging when the training data is limited. In this paper, we meet this challenge by data augmentation based on a semi-supervised generative model, named ...
Semantic guide for semi-supervised few-shot multi-label node classification
Abstract
We study a new research problem named semi-supervised few-shot multi-label node classification which has the following characteristics: 1) the extreme imbalance between the number of labeled and unlabeled nodes that are connected on ...
Deep semi-supervised learning with contrastive learning and partial label propagation for image data
Abstract
Deep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NUS ODPRT Grant
Singapore Ministry of Education Academic Research Fund Tier 3 under MOEs official grant

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

77
Total Citations
View Citations
1,967
Total Downloads

Downloads (Last 12 months)195
Downloads (Last 6 weeks)10

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma XLiu FWu JYang JXue SSheng Q(2025)Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and ObjectivesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350130737:2(881-895)Online publication date: Feb-2025
https://doi.org/10.1109/TKDE.2024.3501307
Guo YBo DYang CLu ZZhang ZLiu JPeng YShi C(2025)Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
https://doi.org/10.1109/TBDATA.2024.3489412
Gao YZhao QYang LYang JYang J(2025)Tensor Representation-Based Multiview Graph Contrastive Learning for IoE IntelligenceIEEE Internet of Things Journal10.1109/JIOT.2024.341561212:4(3482-3492)Online publication date: 15-Feb-2025
https://doi.org/10.1109/JIOT.2024.3415612
Lu HWang LMa XCheng JZhou M(2025)A survey of graph neural networks and their industrial applicationsNeurocomputing10.1016/j.neucom.2024.128761614(128761)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128761
Liu GZhao ZLi CYu Y(2025)LeDA-GNN: Learnable dual augmentation for graph neural networksExpert Systems with Applications10.1016/j.eswa.2024.126288268(126288)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2024.126288
Sultana THossain MMorshed MLee Y(2025)Enhancing link prediction in graph data augmentation through graphon mixupNeural Computing and Applications10.1007/s00521-024-10923-7Online publication date: 10-Jan-2025
https://doi.org/10.1007/s00521-024-10923-7
Yildiz Aktas MKhatri AAlmutairi MAlkulaib LLu C(2025)Enhancing School Success Prediction with FRC and Merged GNNSocial Networks Analysis and Mining10.1007/978-3-031-78548-1_20(262-277)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-3-031-78548-1_20
Abushofa MAtapour-Abarghouei AForshaw MMcGough A(2025)Evaluating Deep Graph Network Performance by Augmenting Node Features with Structural FeaturesSocial Networks Analysis and Mining10.1007/978-3-031-78548-1_12(131-147)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-3-031-78548-1_12
Dan JLiu WLiu MXie CDong SMa GTan YXing JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HOGDA: Boosting Semi-supervised Graph Domain Adaptation via High-Order Structure-Guided Adaptive Feature AlignmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680765(11109-11118)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680765
Yang XMin ELiang KLiu YWang SZhou SWu HLiu XZhu ECai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GraphLearner: Graph Node Clustering with Fully Learnable AugmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680602(5517-5526)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680602
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten