Abstract:
Training a deep neural network model requires frequent communications between machines, and heavy communication traffic limits the scalability of the distributed ma-chine...Show MoreMetadata
Abstract:
Training a deep neural network model requires frequent communications between machines, and heavy communication traffic limits the scalability of the distributed ma-chine learning training. Some works try to reduce the communication traffic by transmitting the clustered gradients. Howe-er, the granularity of gradient clustering in these works is relatively coarse, which may decrease the accuracy and stability of the model. Moreover, our experiments reveal that a type of gradient have a certain degree of correlation, which means that we should cluster the gradients by a fine-grained way. In this article, we propose Cluster and Local Stochastic Gradient De-scent (CL-SGD) scheme, which combines the type-by-type gradient clustering method with local training scheme under the master-slave node architecture. CL-SGD has two key designs: first, fully taking account of the differences from each type of gradient, we propose the type-by-type gradient clustering method, which separately clusters each type of gradient meanwhile combining the local training scheme, to significantly reduce the communication traffic. Second, we use master-slave node architecture to reduce the model accuracy loss caused by clustering gradient. Experiment results show that CL-SGD achieves 1500x compression ratio and reduces training time by up to 51 % than the BSP, Local-SGD, STL-SGD, ClusterGrad.
Date of Conference: 28 May 2023 - 01 June 2023
Date Added to IEEE Xplore: 23 October 2023
ISBN Information:
Electronic ISSN: 1938-1883