Journals & Magazines >IEEE Transactions on Parallel... >Volume: 33 Issue: 11

MIPD: An Adaptive Gradient Sparsification Framework for Distributed DNNs Training

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Asynchronous training based on the parameter server architecture is widely used for scaling up the DNN training over large datasets and DNN models. Communication has been...Show More

Metadata

Abstract:

Asynchronous training based on the parameter server architecture is widely used for scaling up the DNN training over large datasets and DNN models. Communication has been identified as the major bottleneck when deploying the DNN training over the large-scale distributed deep learning systems. Recent studies try to reduce the communication traffic through gradient sparsification and quantization approaches. We identify three limitations in previous studies. First, the fundamental guideline for gradient sparsification of their work is the magnitude of the gradient. However, the gradients’ magnitude represents the current optimization direction while it cannot indicate the significance of the parameters, which potentially results in delayed updating for the significant parameters. Second, their gradient quantization methods based on the entire model often lead to error accumulation for gradients aggregation since the gradients from different layers of the DNN model follow different distributions. Third, previous quantization approaches are CPU intensive, which generates strong overhead for the server. We propose MIPD, an adaptive and layer-wise gradient sparsification framework that compresses the gradients based on model interpretability and probability distribution of gradients. MIPD compresses the gradients according to the corresponding significance of its parameters, which is defined by model interpretability. An Exponential Smoothing method is also proposed to compensate for the dropped gradients on the server to reduce the gradients error. MIPD proposes to update half of the parameters for each training step to reduce the CPU overhead of the server. It encodes the gradients based on their probability distribution, thereby minimizing the approximated errors. Extensive experimental results generated on the GPU cluster indicate that the proposed framework effectively improves the training performance of DNNs by up to 36.2%, which ensures high accuracy as compared t...

Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 33, Issue: 11, 01 November 2022)

Page(s): 3053 - 3066

Date of Publication: 25 February 2022

ISSN Information:

DOI: 10.1109/TPDS.2022.3154387

Funding Agency:

Contents

References is not available for this document.

MIPD: An Adaptive Gradient Sparsification Framework for Distributed DNNs Training

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIPD: An Adaptive Gradient Sparsification Framework for Distributed DNNs Training

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?