Conferences >ICC 2023 - IEEE International...

CL-SGD: Efficient Communication by Clustering and Local-SGD for Distributed Machine Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Training a deep neural network model requires frequent communications between machines, and heavy communication traffic limits the scalability of the distributed ma-chine...Show More

Metadata

Abstract:

Training a deep neural network model requires frequent communications between machines, and heavy communication traffic limits the scalability of the distributed ma-chine learning training. Some works try to reduce the communication traffic by transmitting the clustered gradients. Howe-er, the granularity of gradient clustering in these works is relatively coarse, which may decrease the accuracy and stability of the model. Moreover, our experiments reveal that a type of gradient have a certain degree of correlation, which means that we should cluster the gradients by a fine-grained way. In this article, we propose Cluster and Local Stochastic Gradient De-scent (CL-SGD) scheme, which combines the type-by-type gradient clustering method with local training scheme under the master-slave node architecture. CL-SGD has two key designs: first, fully taking account of the differences from each type of gradient, we propose the type-by-type gradient clustering method, which separately clusters each type of gradient meanwhile combining the local training scheme, to significantly reduce the communication traffic. Second, we use master-slave node architecture to reduce the model accuracy loss caused by clustering gradient. Experiment results show that CL-SGD achieves 1500x compression ratio and reduces training time by up to 51 % than the BSP, Local-SGD, STL-SGD, ClusterGrad.

Published in: ICC 2023 - IEEE International Conference on Communications

Date of Conference: 28 May 2023 - 01 June 2023

Date Added to IEEE Xplore: 23 October 2023

ISBN Information:

Electronic ISSN: 1938-1883

DOI: 10.1109/ICC45041.2023.10279264

Conference Location: Rome, Italy

Funding Agency:

Contents

References is not available for this document.

CL-SGD: Efficient Communication by Clustering and Local-SGD for Distributed Machine Learning

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

CL-SGD: Efficient Communication by Clustering and Local-SGD for Distributed Machine Learning

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?