research-article

DistGNN: scalable distributed training for large-scale graph neural networks

Authors:
Vasimuddin Md

Intel Corporation

Intel Corporation
View Profile

,
Sanchit Misra

Intel Corporation

Intel Corporation
View Profile

,
Guixiang Ma

Intel Corporation

Intel Corporation
View Profile

,
Ramanarayan Mohanty

Intel Corporation

Intel Corporation
View Profile

,
Evangelos Georganas

Intel Corporation

Intel Corporation
View Profile

,
Alexander Heinecke

Intel Corporation

Intel Corporation
View Profile

,
Dhiraj Kalamkar

Intel Corporation

Intel Corporation
View Profile

,
Nesreen K. Ahmed

Intel Corporation

Intel Corporation
View Profile

,
Sasikanth Avancha

Intel Corporation

Intel Corporation
View Profile

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2021Article No.: 76Pages 1–14https://doi.org/10.1145/3458817.3480856

Published:13 November 2021Publication History

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–14

ABSTRACT

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication volumes across multiple nodes. In this paper, we present DistGNN that optimizes the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters via an efficient shared memory implementation, communication reduction using a minimum vertex-cut graph partitioning algorithm and communication avoidance using a family of delayed-update algorithms. Our results on four common GNN benchmark datasets: Reddit, OGB-Products, OGB-Papers and Proteins, show up to 3.7× speed-up using a single CPU socket and up to 97× speed-up using 128 CPU sockets, respectively, over baseline DGL implementations running on a single CPU socket.

Supplemental Material

DistGNN Scalable Distributed Training for Large-Scale Graph Neural Networks 232 Morning 1.mp4

mp4

322.2 MB

Download

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283.Google ScholarDigital Library
Réka Albert, Hawoong Jeong, and Albert-László Barabási. 2000. Error and attack tolerance of complex networks. nature 406, 6794 (2000), 378--382.Google Scholar
Sasikanth Avancha, Vasimuddin Md, Sanchit Misra, and Ramanarayan Mohanty. 2020. Deep Graph Library Optimizations for Intel (R) x86 Architecture. arXiv preprint arXiv:2007.06354 (2020).Google Scholar
Ariful Azad, Georgios A Pavlopoulos, Christos A Ouzounis, Nikos C Kyrpides, and Aydin Buluç. 2018. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic acids research 46, 6 (2018), e33--e33.Google Scholar
Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18--42.Google ScholarCross Ref
Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations.Google Scholar
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).Google Scholar
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257--266.Google ScholarDigital Library
Victor de Boer, Jan Wielemaker, Judith van Gent, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, and Guus Schreiber. 2012. Supporting Linked Data Production for Cultural Heritage Institutes: The Amsterdam Museum Case Study. In The Semantic Web: Research and Applications, Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 733--747.Google Scholar
Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).Google Scholar
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12). 17--30.Google Scholar
William L Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, and Jure Leskovec. 2018. Embedding logical queries on knowledge graphs. arXiv preprint arXiv:1806.01445 (2018).Google Scholar
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.Google Scholar
Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016. LIBXSMM: accelerating small matrix multiplications by runtime code generation. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 981--991.Google ScholarCross Ref
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687 (2020).Google Scholar
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687 (2020). https://ogb.stanford.edu/docs/leader_nodeprop/Google Scholar
Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, and Yida Wang. 2020. Featgraph: A flexible and efficient backend for graph neural network systems. arXiv preprint arXiv:2008.11359 (2020).Google Scholar
Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems 2 (2020), 187--198.Google Scholar
George Karypis and Vipin Kumar. 1997. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. (1997).Google Scholar
Yucheng Low. 2013. GraphLab: A Distributed Abstraction for Large Scale Machine Learning. (2013).Google Scholar
Guixiang Ma, Yao Xiao, Theodore L Willke, Nesreen K Ahmed, Shahin Nazarian, and Paul Bogdan. 2020. A Vertex Cut based Framework for Load Balancing and Parallelism Optimization in Multi-core Systems. arXiv preprint arXiv:2010.04414 (2020).Google Scholar
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 443--458.Google Scholar
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.Google ScholarDigital Library
Sameh K Mohamed, Vít Nováček, and Aayah Nounu. 2020. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics 36, 2 (2020), 603--610.Google ScholarCross Ref
Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J Wright. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730 (2011).Google Scholar
Intel OneCCL. 2020. https://github.com/intel/torch-cclGoogle Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.Google Scholar
Yunsheng Shi, Zhengjie Huang, Shikun Feng, and Yu Sun. 2020. Masked label prediction: Unified massage passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509 (2020).Google Scholar
Alok Tripathy, Katherine Yelick, and Aydin Buluc. 2020. Reducing Communication in Graph Neural Network Training. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 987--1000.Google Scholar
Robert A Van De Geijn and Jerrell Watts. 1997. Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9, 4 (1997), 255--274.Google ScholarCross Ref
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, et al. 2019. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019).Google Scholar
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed Power-law Graph Computing: Theoretical and Empirical Analysis.. In Nips, Vol. 27. 1673--1681.Google Scholar
Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In First international workshop on graph data management experiences and systems. 1--6.Google ScholarDigital Library
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).Google Scholar
Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3165--3166.Google ScholarDigital Library
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974--983.Google ScholarDigital Library
Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. arXiv preprint arXiv:2010.05337 (2020).Google Scholar
Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, 13 (2018), i457--i466.Google ScholarCross Ref

Index Terms

DistGNN: scalable distributed training for large-scale graph neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

A note on the bichromatic numbers of graphs

For a pair of integers k , l ý0, a graph G is ( k, l )-colorable if its vertices can be partitioned into at most k independent sets and at most l cliques. The bichromatic number ý^b( G ) of G is the least integer r such that for all k ...
Read More
Scalable Graph Neural Networks with Deep Graph Library
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Learning from graph and relational data plays a major role in many applications including social network analysis, marketing, e-commerce, information retrieval, knowledge modeling, medical and biological sciences, engineering, and others. In the last ...
Read More
Partitioning extended P4-laden graphs into cliques and stable sets

A (k,@?)-cocoloring of a graph is a partition of its vertex set into at most k stable sets and at most @? cliques. It is known that deciding if a graph is (k,@?)-cocolorable is NP-complete. A graph is extended P"4-laden if every induced subgraph with at ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
- Artifacts Available / v1.1
Author Tags
deep graph library
deep learning
distributed algorithm
graph neural networks
graph partition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 1,478
  Total Downloads
- Downloads (Last 12 months)432
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DistGNN: scalable distributed training for large-scale graph neural networks

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A note on the bichromatic numbers of graphs

Scalable Graph Neural Networks with Deep Graph Library

Partitioning extended P4-laden graphs into cliques and stable sets