skip to main content
10.1145/3447786.3456233acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

DGCL: an efficient communication library for distributed GNN training

Published: 21 April 2021 Publication History

Abstract

Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics. Distributed GNN training is essential for handling large graphs and reducing the execution time. However, for distributed GNN training, a peer-to-peer communication strategy suffers from high communication overheads. Also, different GPUs require different remote vertex embeddings, which leads to an irregular communication pattern and renders existing communication planning solutions unsuitable. We propose the distributed graph communication library (DGCL) for efficient GNN training on multiple GPUs. At the heart of DGCL is a communication planning algorithm tailored for GNN training, which jointly considers fully utilizing fast links, fusing communication, avoiding contention and balancing loads on different links. DGCL can be easily adopted to extend existing single-GPU GNN systems to distributed training. We conducted extensive experiments on different datasets and network configurations to compare DGCL with alternative communication schemes. In our experiments, DGCL reduces the communication time of the peer-to-peer communication by 77.5% on average and the training time for an epoch by up to 47%.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265--283.
[2]
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices 52, 8 (2017), 235--248.
[3]
Jehoshua Bruck and Ching-Tien Ho. 1993. Efficient global combine operations in multi-port message-passing systems. Parallel Processing Letters 3, 04 (1993), 335--346.
[4]
Rong Chen, Jiaxin Shi, Binyu Zang, and Haibing Guan. 2014. Bipartite-oriented distributed graph partitioning for big learning. In Proceedings of 5th Asia-Pacific Workshop on Systems. 1--7.
[5]
Yuh-Rong Chen, Sridhar Radhakrishnan, Sudarshan Dhall, and Suleyman Karabuk. 2013. On multi-stream multi-source multicast routing. Computer Networks 57, 15 (2013), 2916--2930.
[6]
Minsik Cho, Ulrich Finkler, and David Kung. 2019. BlueConnect: Novel Hierarchical All-Reduce on Multi-tired Network for Deep Learning. In Proceedings of the Conference on Systems and Machine Learning (SysML).
[7]
Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).
[8]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[10]
Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015).
[11]
Huawei. 2020. MindSpore. https://e.huawei.com/us/products/cloud-computing-dc/atlas/mindspore.
[12]
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. 2017. A distributed multi-gpu system for fast graph processing. Proceedings of the VLDB Endowment 11, 3 (2017), 297--310.
[13]
Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems (MLSys) (2020), 187--198.
[14]
George Karypis. 1997. METIS: Unstructured graph partitioning and sparse matrix ordering system. Technical report (1997).
[15]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.
[16]
Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. Gts: A fast and scalable graph processing method based on streaming topology to gpus. In Proceedings of the 2016 International Conference on Management of Data. 447--461.
[17]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[18]
Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Signed networks in social media. In Proceedings of the SIGCHI conference on human factors in computing systems. 1361--1370.
[19]
Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.
[20]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
[21]
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 443--458.
[22]
NVIDIA. 2020. DGX Systems. https://www.nvidia.com/en-sg/data-center/dgx-systems. [Online; accessed 8-Oct-2020].
[23]
NVIDIA. 2020. GPUDirect RDMA. https://docs.nvidia.com/cuda/gpudirect-rdma. [Online; accessed 8-Oct-2020].
[24]
NVIDIA. 2020. NVIDIA Collective communications library (NCCL). https://https://developer.nvidia.com/nccl. [Online; accessed 8-Oct-2020].
[25]
NVIDIA. 2020. NVLink and NVSwitch. https://www.nvidia.com/en-sg/data-center/nvlink. [Online; accessed 8-Oct-2020].
[26]
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D Owens. 2017. Multi-GPU graph analytics. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 479--490.
[27]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037.
[28]
Pitch Patarasuk and Xin Yuan. 2007. Bandwidth efficient all-reduce operation on tree topologies. In 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1--8.
[29]
Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput. 69, 2 (2009), 117--124.
[30]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472--488.
[31]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[32]
Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. 2016. Learning Multiagent Communication with Backpropagation. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 2244--2252. https://proceedings.neurips.cc/paper/2016/hash/55b1927fdafef39c48e5b73b5d61ea60-Abstract.html
[33]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[34]
Chu-Fu Wang, Chun-Teng Liang, and Rong-Hong Jan. 2002. Heuristic algorithms for packing of multiple-group multicasting. Computers & Operations Research 29, 7 (2002), 905--924.
[35]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et al. 2019. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).
[36]
Wikipedia. 2020. InfiniBand. https://en.wikipedia.org/wiki/InfiniBand. [Online; accessed 8-Oct-2020].
[37]
Wikipedia. 2020. Steiner tree problem. https://en.wikipedia.org/wiki/Steiner_tree_problem. [Online; accessed 8-Oct-2020].
[38]
Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, and Fan Yu. 2021. Seastar: Vertex-Centric Programming for Graph Neural Networks. In Proceedings of the Fourteenth EuroSys Conference 2021, April 26-28, 2021. ACM.
[39]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
[40]
Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3165--3166.
[41]
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181--213.
[42]
Yu Zhang, Xiaofei Liao, Hai Jin, Bingsheng He, Haikun Liu, and Lin Gu. 2019. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 601--614.
[43]
Jianlong Zhong and Bingsheng He. 2013. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2013), 1543--1552.

Cited By

View all
  • (2025)Adaptive Parallel Training for Graph Neural NetworksProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710883(29-42)Online publication date: 28-Feb-2025
  • (2025)From Sancus to Sancus $$^q$$: staleness and quantization-aware full-graph decentralized training in graph neural networksThe VLDB Journal10.1007/s00778-024-00897-234:2Online publication date: 31-Jan-2025
  • (2025)Distributed Temporal Graph Neural Network Learning over Large-Scale Dynamic GraphsDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_4(51-66)Online publication date: 11-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
April 2021
631 pages
ISBN:9781450383349
DOI:10.1145/3447786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. distributed and parallel training
  2. graph neural networks
  3. network communication

Qualifiers

  • Research-article

Conference

EuroSys '21
Sponsor:
EuroSys '21: Sixteenth European Conference on Computer Systems
April 26 - 28, 2021
Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)24
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Adaptive Parallel Training for Graph Neural NetworksProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710883(29-42)Online publication date: 28-Feb-2025
  • (2025)From Sancus to Sancus $$^q$$: staleness and quantization-aware full-graph decentralized training in graph neural networksThe VLDB Journal10.1007/s00778-024-00897-234:2Online publication date: 31-Jan-2025
  • (2025)Distributed Temporal Graph Neural Network Learning over Large-Scale Dynamic GraphsDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_4(51-66)Online publication date: 11-Jan-2025
  • (2024)NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor ParallelismProceedings of the VLDB Endowment10.14778/3705829.370583718:2(173-186)Online publication date: 1-Oct-2024
  • (2024)Efficient Training of Graph Neural Networks on Large GraphsProceedings of the VLDB Endowment10.14778/3685800.368584417:12(4237-4240)Online publication date: 8-Nov-2024
  • (2024)OUTRE: An OUT-of-Core De-REdundancy GNN Training Framework for Massive Graphs within A Single MachineProceedings of the VLDB Endowment10.14778/3681954.368197617:11(2960-2973)Online publication date: 30-Aug-2024
  • (2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
  • (2024)FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network TrainingProceedings of the VLDB Endowment10.14778/3648160.364818417:6(1473-1486)Online publication date: 3-May-2024
  • (2024)Comprehensive Evaluation of GNN Training Systems: A Data Management PerspectiveProceedings of the VLDB Endowment10.14778/3648160.364816717:6(1241-1254)Online publication date: 3-May-2024
  • (2024)A survey of graph convolutional networks (GCNs) in FPGA-based acceleratorsJournal of Big Data10.1186/s40537-024-01022-411:1Online publication date: 11-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media