Loading [a11y]/accessibility-menu.js
CL-SGD: Efficient Communication by Clustering and Local-SGD for Distributed Machine Learning | IEEE Conference Publication | IEEE Xplore

CL-SGD: Efficient Communication by Clustering and Local-SGD for Distributed Machine Learning


Abstract:

Training a deep neural network model requires frequent communications between machines, and heavy communication traffic limits the scalability of the distributed ma-chine...Show More

Abstract:

Training a deep neural network model requires frequent communications between machines, and heavy communication traffic limits the scalability of the distributed ma-chine learning training. Some works try to reduce the communication traffic by transmitting the clustered gradients. Howe-er, the granularity of gradient clustering in these works is relatively coarse, which may decrease the accuracy and stability of the model. Moreover, our experiments reveal that a type of gradient have a certain degree of correlation, which means that we should cluster the gradients by a fine-grained way. In this article, we propose Cluster and Local Stochastic Gradient De-scent (CL-SGD) scheme, which combines the type-by-type gradient clustering method with local training scheme under the master-slave node architecture. CL-SGD has two key designs: first, fully taking account of the differences from each type of gradient, we propose the type-by-type gradient clustering method, which separately clusters each type of gradient meanwhile combining the local training scheme, to significantly reduce the communication traffic. Second, we use master-slave node architecture to reduce the model accuracy loss caused by clustering gradient. Experiment results show that CL-SGD achieves 1500x compression ratio and reduces training time by up to 51 % than the BSP, Local-SGD, STL-SGD, ClusterGrad.
Date of Conference: 28 May 2023 - 01 June 2023
Date Added to IEEE Xplore: 23 October 2023
ISBN Information:
Electronic ISSN: 1938-1883
Conference Location: Rome, Italy

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.