ABSTRACT
Deep Neural Network (Deep Learning) models have been traditionally trained on dedicated servers, after collecting data from various edge devices and sending them to the server. In recent years new methodologies have emerged for training models in a distributed manner over edge devices, keeping the data on the devices themselves. This allows for better data privacy and reduces the training costs. One of the main challenges for such methodologies is reducing the communication costs to and mainly from the edge devices. In this work we compare the two main methodologies used for distributed edge training: Federated Learning and Large Batch Training. For each of the methodologies we examine their convergence rates, communication costs, and final model performance. In addition, we present two techniques for compressing the communication between the edge devices, and examine their suitability for each one of the training methodologies.
- P. Goyal, P. Dollár, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: training ImageNet in 1 hour. CoRR.Google Scholar
- N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large-batch training for deep learning: Generalization gap and sharp minima. ICLR, 2017.Google Scholar
- H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition.Google Scholar
- Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon, "Federated Learning: Strategies for Improving Communication Efficiency"Google Scholar
- H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas, "Communication-Efficient Learning of Deep Networks from Decentralized Data"Google Scholar
- Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, "Deep gradient compression: Reducing the communication bandwidth for distributed training" arXiv preprint arXiv:1712.01887, 2017.Google Scholar
Recommendations
Federated Self-training for Semi-supervised Audio Recognition
Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices such as smartphones and virtual assistants, labeling is entrusted to the clients or labels are extracted in an ...
Domain-adversarial training of neural networks
We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for ...
Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices
Embedded Computer Systems: Architectures, Modeling, and SimulationAbstractPerforming inference tasks of deep learning applications on IoT edge devices ensures privacy of input data and can result in shorter latency when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot ...
Comments