AMBLE: Adjusting mini-batch and local epoch for federated learning with heterogeneous devices
Introduction
With the great success of digital technologies, including smartphones and the Internet of Things (IoT), the amount of data independently created, collected, and stored by individual devices is rapidly increasing. To discover new information or insight from the data collected, various techniques are applied. In particular, deep learning is currently the most favored methodology. Although deep learning is well suited to discover hidden meanings from a large amount of data, large and diverse sets of data are required to improve the model performance (i.e., accuracy). Thus, it is necessary to collect data generated by various devices or organizations for diversity. However, privacy issues arising during this process must be addressed. To ensure data privacy, Google presented federated learning as the next generation of AI learning and proposed the federated averaging (FedAvg) algorithm [25] for deep learning models with multiple local devices, where the data are decentralized and one centralized server updates the deep learning model. By using federated learning, it is possible to obtain similar effects to process all data simultaneously while independently processing the data from various devices without collecting distributed local data in the central server.
Federated learning allows training without data leakage where data privacy, such as clinical data in hospitals, must be protected. Numerous tasks are currently processed through personal smartphones and desktops, and thus federated learning is an extremely efficient method for applying a deep learning method. The advantages of federated learning have recently been widely recognized and applied in many studies and industrial fields. Gboard [11], a Google keyboard app, uses federated learning to more accurately predict words and emojis that are expected to be typed on tens of millions of devices. Previously, a new word was recommended only after a user attempted to input it a few times. By applying federated learning, Gboard learns new words from the use of thousands of users without monitoring what the user is typing.
Meanwhile, there are still challenges to overcome in federated learning. Li et al. [21] analyzed the challenges of federated learning within four categories: (1) Expensive communication: Federated networks with numerous devices can be slower than local clusters. Thus, it is important to adapt an efficient communication scheme. (2) System heterogeneity: In federated learning, the system performance of each device may vary depending on the CPU, cellular type, and battery life. (3) Statistical heterogeneity: Inherited from the second challenge, with system heterogeneity, the training data are not identically or independently distributed (i.e., non-IID). (4) Privacy concerns: The core concept of federated learning is protecting the share of data generated on each device. However, it needs to be advanced further for the model updates such as gradient information. In this paper, we address the first (i.e., expensive communication) and second (i.e., system heterogeneity) issues.
Existing studies on reducing communication overhead in federated learning are mostly focused on local mini-batch stochastic gradient descent (SGD) [25] and developing an efficient communication methods [22]. In local mini-batch SGD, the local value of the gradient is updated in an iterative fashion and transferred to the centralized server. Hence, compared to a case without these methods, the model not only converges with fewer rounds but also achieves a higher accuracy. The main reason for this is a reduced number of communications with the centralized server. For efficient communication, in their recent study [22], Amiri et al. employed an edge computing architecture, which is another type of decentralized server architecture, to avoid the occurrence of a communication bottleneck. However, the edge server environment comprises diverse types of equipment for data processing. Thus, the proposed method requires further investigation to verify its effectiveness.
One of the major issues related to build an efficient communication method for federated learning is the way to update gradient, i.e., selection between synchronous and asynchronous updates. Synchronous updates in federated learning are simple to implement and guarantee high accuracy but are more vulnerable in the face of device heterogeneity [21]. It is because faster processing devices (i.e., more powerful devices) should be waiting for slower processing devices. Thus, there can be stragglers caused by various reasons (e.g., device heterogeneity, connection failures, and imbalanced data distribution among devices). On the other hand, asynchronous updating is one of schemes that address the problem of stragglers of heterogeneous devices [8], [23] in federated learning. However, unlike the synchronous update, the asynchronous synchronization method has its own problem, i.e., the stale gradient problem. Because of gradient staleness, the accuracy of the deep learning model using the asynchronous update approach is lower than that of the synchronous update approach. For those who want to avoid degradation of model accuracy, studies [4], [5], [25], [30], [31] that use a synchronous method in federated learning to update the global model have been proposed. Additionally, synchronous updating in federated learning has faster convergence rates than asynchronous updating [6], [7], [25]. Recently, there have been studies [24], [35] in adaptively adjusting the mini-batch size to minimize waiting time while synchronously updates the gradient, which is one of our motivated works to leverage the adaptive mini-batch approach.
In this paper, we propose a novel federated learning scheme called Adjusting Mini-Batch and Local Epoch (AMBLE) that adjusts the local mini-batch and local epoch sizes adaptively for federated learning with heterogeneous devices. In AMBLE, we leverage synchronous updates with a local mini-batch SGD to update the global model. As noted, in local mini-batch SGD with federated learning, the communication overhead is reduced and a deep learning model with high accuracy can be achieved. However, if the clients are composed of heterogeneous devices, some straggler devices may stall the other devices. Thus, other devices will waste their computational resources (i.e., increase their training time) owing to stragglers. To address this problem, we introduce an additional process in AMBLE to adjust the local mini-batch and local epoch sizes for each device to compensate for the time during which most devices stall. Consequently, each device in AMBLE has different local epoch and local mini-batch sizes, which will alleviate the straggler problem. However, other problems need to be considered because of different local epoch and mini-batch sizes, that is, the varying frequency of data usage and the decreased accuracy. To solve these problems, we adopt a linear LR [18] to scale the learning rate according to the local mini-batch and local epoch sizes. The contributions of our study are summarized as follows:
- •
We propose a novel federated learning scheme, AMBLE, that adjusts the local mini-batch and local epoch sizes for heterogeneous devices. With AMBLE, we can enhance the computational efficiency by removing stragglers and scale the local learning rate to improve the model convergence rate.
- •
We implemented a prototype of AMBLE using PyTorch and conducted empirical evaluation experiments. Through our experiment, we show the effect of learning rate scaling on the adaptive local mini-batch and local epoch sizes. We also show that AMBLE performs better than FedAvg for both non-IID and IID cases.
The remainder of this paper is organized as follows. In Section 2, we present the background of federated learning and related studies conducted on local mini-batch SGD and the adaptive batch size approach. Our proposed AMBLE scheme for federated learning with heterogeneous devices is then presented in Section 3. The experiment results and an analysis of our proposed scheme are presented in Section 4. Finally, some concluding remarks and future studies are discussed in Section 5.
Section snippets
Federated learning
Federated learning is the machine learning technique that trains a model without data sharing, which hinders the naive adaptation of conventional distributed deep learning schemes. To address the privacy issue, the general principle of federated learning method is that the model is trained locally with the data of each mobile device to ensure security and privacy [2], [3].
In a traditional distributed deep learning scheme, all calculated gradients are aggregated to the central server or the
AMBLE for heterogeneous devices
In this study, we address the device heterogeneity (i.e., computation heterogeneity) challenge of federated learning, which may cause stragglers for updating the global model and will degrade the training performance. To solve the problem caused by stragglers, we adaptively adjust the local mini-batch size and local epoch size (AMBLE) to yield more computations. We leverage linear LR scaling [18] to increase the training accuracy by redressing the imbalance between gradients of devices.
Experiment setup
To verify the effectiveness of our proposed scheme, we built a prototype of federated learning with AMBLE using PyTorch [26] and conducted empirical evaluations. For the implementation we used Python 3.8 (3.8.0 for Jetson series, and 3.8.8 and 3.8.12 for servers), Pytorch 1.7.0 and torchvision 0.8.0. As an experimental environment, we configured ten heterogeneous devices with two NVIDIA Jetson TX1, one NVIDIA Jetson TX2, five NVIDIA Jetson Nano, and two servers, which are consisted of one GTX
Conclusion and future works
In federated learning, there is a high chance that the participating devices are configured with heterogeneous devices. The heterogeneity of the devices may cause a straggler problem in federated learning owing to their imbalanced computational performances. To address the straggler problem, we proposed a novel AMBLE scheme that adaptively adjusts the local mini-batch size and local epoch size for heterogeneous devices in federated learning with synchronous updates. Although there have been
CRediT authorship contribution statement
Juwon Park: Conceptualization, Investigation, Methodology, Writing – original draft. Daegun Yoon: Data curation, Visualization. Sangho Yeo: Formal analysis, Investigation, Software. Sangyoon Oh: Conceptualization, Funding acquisition, Supervision, Validation, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This research was jointly supported by the Basic Science Research Program (2021R1F1A1062779) of the National Research Foundation of Korea (NRF) funded by the Ministry of Education, the supercomputing application department at Korea Institute of Science and Technology Information (KSC-2021-CRE-0363), and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-2018-0-01431) supervised by the IITP (Institute for Information
Juwon Park received his B.S in the Department of Software Engineering from the Ajou University in 2019. Currently, he is a Master student of the Department of Artificial Intelligence at the Ajou University. His research interests include big data, deep learning and distributed and parallel computing in heterogeneous environment.
References (35)
- et al.
A pi consensus controller for networked clocks synchronization
IFAC Proc. Vol.
(2008) - et al.
Federated learning with quantized global model updates
- et al.
Practical secure aggregation for federated learning on user-held data
- et al.
Practical secure aggregation for privacy-preserving machine learning
- et al.
Towards federated learning at scale: system design
- et al.
Round-Robin synchronization: mitigating communication bottlenecks in parameter servers
- et al.
Revisiting distributed synchronous sgd
- et al.
High-performance distributed ml at scale through parameter server consistency models
- et al.
Large scale distributed deep networks
- et al.
Adaptive gradient sparsification for efficient federated learning: an online learning approach
Federated learning for mobile keyboard prediction
Group knowledge transfer: federated learning of large cnns at the edge
Adv. Neural Inf. Process. Syst.
Distilling the knowledge in a neural network
Adam: a method for stochastic optimization
Federated optimization: distributed optimization beyond the datacenter
Federated optimization: distributed machine learning for on-device intelligence
Federated learning: strategies for improving communication efficiency
Cited by (12)
MiniPFL: Mini federations for hierarchical personalized federated learning
2024, Future Generation Computer SystemsProactive auto-scaling technique for web applications in container-based edge computing using federated learning model
2024, Journal of Parallel and Distributed ComputingSpecial issue on Distributed Intelligence at the Edge for the Future Internet of Things
2023, Journal of Parallel and Distributed ComputingSignSGD with Federated Voting
2024, arXiv
Juwon Park received his B.S in the Department of Software Engineering from the Ajou University in 2019. Currently, he is a Master student of the Department of Artificial Intelligence at the Ajou University. His research interests include big data, deep learning and distributed and parallel computing in heterogeneous environment.
Daegun Yoon received his B.S in the Department of Software Engineering from the Ajou University in 2018. Currently, he is a Ph.D. student of the Department of Artificial Intelligence at the Ajou University. His research interests include highly-scalable graph processing, general-purpose computing on GPUs (GPGPU), and high-performance computing (HPC).
Sangho Yeo received his B.S in the Department of Software Engineering from the Ajou University in 2017. Currently, he is a Ph.D. student of the Department of Artificial Intelligence at the Ajou University. His research interests include deep reinforcement learning, distributed deep learning, federated learning with heterogeneous devices, and distributed and parallel computing.
Sangyoon Oh received the PhD degree from the Computer Science Department, Indiana University, Bloomington, Indiana. He is currently a professor of the School of Information and Computer Engineering, Ajou University, South Korea. Before joining Ajou University, he worked for SK Telecom, South Korea. His main research interests include high-performance computing, distributed deep learning, and graph data processing.