Loading [a11y]/accessibility-menu.js
EDGES: An Efficient Distributed Graph Embedding System on GPU Clusters | IEEE Journals & Magazine | IEEE Xplore

EDGES: An Efficient Distributed Graph Embedding System on GPU Clusters


Abstract:

Graph embedding training models access parameters sparsely in a “one-hot” manner. Currently, the distributed graph embedding neural network is learned by data parallel wi...Show More

Abstract:

Graph embedding training models access parameters sparsely in a “one-hot” manner. Currently, the distributed graph embedding neural network is learned by data parallel with the parameter server, which suffers significant performance and scalability problems. In this article, we analyze the problems and characteristics of training this kind of models on distributed GPU clusters for the first time, and find that fixed model parameters scattered among different machine nodes are a major limiting factor for efficiency. Based on our observation, we develop an efficient distributed graph embedding system called EDGES, which can utilize GPU clusters to train large graph models with billions of nodes and trillions of edges using data and model parallelism. Within the system, we propose a novel dynamic partition architecture for training these models, achieving at least one half of communication reduction compared to existing training systems. According to our evaluations on real-world networks, our system delivers a competitive accuracy for the trained embeddings, and significantly accelerates the training process of the graph node embedding neural network, achieving a speedup of 7.23x and 18.6x over the existing fastest training system on single node and multi-node, respectively. As for the scalability, our experiments show that EDGES obtains a nearly linear speedup.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 32, Issue: 7, 01 July 2021)
Page(s): 1892 - 1902
Date of Publication: 27 November 2020

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.