AMBLE: Adjusting mini-batch and local epoch for federated learning with heterogeneous devices

doi:10.1016/j.jpdc.2022.07.009

Journal of Parallel and Distributed Computing

Volume 170, December 2022, Pages 13-23

https://doi.org/10.1016/j.jpdc.2022.07.009 Get rights and content

Highlights

•
AMBLE is the scheme for heterogeneous devices in federated learning.
•
AMBLE adjusts local mini-batch and local epoch adaptively.
•
AMBLE solves the straggler problem caused by system heterogeneity.
•
AMBLE can add more computation while waiting time caused by stragglers.
•
AMBLE adopts learning rate scaling to improve performance.

Abstract

As data privacy becomes increasingly important, federated learning applied to the training of deep learning models while ensuring the data privacy of devices is entering the spotlight. Federated learning makes it possible to process all data at once while processing data independently from various devices without collecting distributed local data in a central server. However, there are still challenges to overcome for the system of devices in federated learning such as communication overheads and the heterogeneity of the system. In this paper, we propose the Adjusting Mini-Batch and Local Epoch (AMBLE) approach, which adaptively adjusts the local mini-batch and local epoch size for heterogeneous devices in federated learning and updates the parameters synchronously. With AMBLE, we enhance the computational efficiency by removing stragglers and scaling the local learning rate to improve the model convergence rate and accuracy. We verify that federated learning with AMBLE is a stably trained model with a faster convergence speed and higher accuracy than FedAvg and adaptive batch size scheme for both identically and independently distributed (IID) and non-IID cases.

Introduction

With the great success of digital technologies, including smartphones and the Internet of Things (IoT), the amount of data independently created, collected, and stored by individual devices is rapidly increasing. To discover new information or insight from the data collected, various techniques are applied. In particular, deep learning is currently the most favored methodology. Although deep learning is well suited to discover hidden meanings from a large amount of data, large and diverse sets of data are required to improve the model performance (i.e., accuracy). Thus, it is necessary to collect data generated by various devices or organizations for diversity. However, privacy issues arising during this process must be addressed. To ensure data privacy, Google presented federated learning as the next generation of AI learning and proposed the federated averaging (FedAvg) algorithm [25] for deep learning models with multiple local devices, where the data are decentralized and one centralized server updates the deep learning model. By using federated learning, it is possible to obtain similar effects to process all data simultaneously while independently processing the data from various devices without collecting distributed local data in the central server.

Federated learning allows training without data leakage where data privacy, such as clinical data in hospitals, must be protected. Numerous tasks are currently processed through personal smartphones and desktops, and thus federated learning is an extremely efficient method for applying a deep learning method. The advantages of federated learning have recently been widely recognized and applied in many studies and industrial fields. Gboard [11], a Google keyboard app, uses federated learning to more accurately predict words and emojis that are expected to be typed on tens of millions of devices. Previously, a new word was recommended only after a user attempted to input it a few times. By applying federated learning, Gboard learns new words from the use of thousands of users without monitoring what the user is typing.

Meanwhile, there are still challenges to overcome in federated learning. Li et al. [21] analyzed the challenges of federated learning within four categories: (1) Expensive communication: Federated networks with numerous devices can be slower than local clusters. Thus, it is important to adapt an efficient communication scheme. (2) System heterogeneity: In federated learning, the system performance of each device may vary depending on the CPU, cellular type, and battery life. (3) Statistical heterogeneity: Inherited from the second challenge, with system heterogeneity, the training data are not identically or independently distributed (i.e., non-IID). (4) Privacy concerns: The core concept of federated learning is protecting the share of data generated on each device. However, it needs to be advanced further for the model updates such as gradient information. In this paper, we address the first (i.e., expensive communication) and second (i.e., system heterogeneity) issues.

Existing studies on reducing communication overhead in federated learning are mostly focused on local mini-batch stochastic gradient descent (SGD) [25] and developing an efficient communication methods [22]. In local mini-batch SGD, the local value of the gradient is updated in an iterative fashion and transferred to the centralized server. Hence, compared to a case without these methods, the model not only converges with fewer rounds but also achieves a higher accuracy. The main reason for this is a reduced number of communications with the centralized server. For efficient communication, in their recent study [22], Amiri et al. employed an edge computing architecture, which is another type of decentralized server architecture, to avoid the occurrence of a communication bottleneck. However, the edge server environment comprises diverse types of equipment for data processing. Thus, the proposed method requires further investigation to verify its effectiveness.

One of the major issues related to build an efficient communication method for federated learning is the way to update gradient, i.e., selection between synchronous and asynchronous updates. Synchronous updates in federated learning are simple to implement and guarantee high accuracy but are more vulnerable in the face of device heterogeneity [21]. It is because faster processing devices (i.e., more powerful devices) should be waiting for slower processing devices. Thus, there can be stragglers caused by various reasons (e.g., device heterogeneity, connection failures, and imbalanced data distribution among devices). On the other hand, asynchronous updating is one of schemes that address the problem of stragglers of heterogeneous devices [8], [23] in federated learning. However, unlike the synchronous update, the asynchronous synchronization method has its own problem, i.e., the stale gradient problem. Because of gradient staleness, the accuracy of the deep learning model using the asynchronous update approach is lower than that of the synchronous update approach. For those who want to avoid degradation of model accuracy, studies [4], [5], [25], [30], [31] that use a synchronous method in federated learning to update the global model have been proposed. Additionally, synchronous updating in federated learning has faster convergence rates than asynchronous updating [6], [7], [25]. Recently, there have been studies [24], [35] in adaptively adjusting the mini-batch size to minimize waiting time while synchronously updates the gradient, which is one of our motivated works to leverage the adaptive mini-batch approach.

In this paper, we propose a novel federated learning scheme called Adjusting Mini-Batch and Local Epoch (AMBLE) that adjusts the local mini-batch and local epoch sizes adaptively for federated learning with heterogeneous devices. In AMBLE, we leverage synchronous updates with a local mini-batch SGD to update the global model. As noted, in local mini-batch SGD with federated learning, the communication overhead is reduced and a deep learning model with high accuracy can be achieved. However, if the clients are composed of heterogeneous devices, some straggler devices may stall the other devices. Thus, other devices will waste their computational resources (i.e., increase their training time) owing to stragglers. To address this problem, we introduce an additional process in AMBLE to adjust the local mini-batch and local epoch sizes for each device to compensate for the time during which most devices stall. Consequently, each device in AMBLE has different local epoch and local mini-batch sizes, which will alleviate the straggler problem. However, other problems need to be considered because of different local epoch and mini-batch sizes, that is, the varying frequency of data usage and the decreased accuracy. To solve these problems, we adopt a linear LR [18] to scale the learning rate according to the local mini-batch and local epoch sizes. The contributions of our study are summarized as follows:

•
We propose a novel federated learning scheme, AMBLE, that adjusts the local mini-batch and local epoch sizes for heterogeneous devices. With AMBLE, we can enhance the computational efficiency by removing stragglers and scale the local learning rate to improve the model convergence rate.
•
We implemented a prototype of AMBLE using PyTorch and conducted empirical evaluation experiments. Through our experiment, we show the effect of learning rate scaling on the adaptive local mini-batch and local epoch sizes. We also show that AMBLE performs better than FedAvg for both non-IID and IID cases.

The remainder of this paper is organized as follows. In Section 2, we present the background of federated learning and related studies conducted on local mini-batch SGD and the adaptive batch size approach. Our proposed AMBLE scheme for federated learning with heterogeneous devices is then presented in Section 3. The experiment results and an analysis of our proposed scheme are presented in Section 4. Finally, some concluding remarks and future studies are discussed in Section 5.

Section snippets

Federated learning

Federated learning is the machine learning technique that trains a model without data sharing, which hinders the naive adaptation of conventional distributed deep learning schemes. To address the privacy issue, the general principle of federated learning method is that the model is trained locally with the data of each mobile device to ensure security and privacy [2], [3].

In a traditional distributed deep learning scheme, all calculated gradients are aggregated to the central server or the

AMBLE for heterogeneous devices

In this study, we address the device heterogeneity (i.e., computation heterogeneity) challenge of federated learning, which may cause stragglers for updating the global model and will degrade the training performance. To solve the problem caused by stragglers, we adaptively adjust the local mini-batch size and local epoch size (AMBLE) to yield more computations. We leverage linear LR scaling [18] to increase the training accuracy by redressing the imbalance between gradients of devices.

Experiment setup

To verify the effectiveness of our proposed scheme, we built a prototype of federated learning with AMBLE using PyTorch [26] and conducted empirical evaluations. For the implementation we used Python 3.8 (3.8.0 for Jetson series, and 3.8.8 and 3.8.12 for servers), Pytorch 1.7.0 and torchvision 0.8.0. As an experimental environment, we configured ten heterogeneous devices with two NVIDIA Jetson TX1, one NVIDIA Jetson TX2, five NVIDIA Jetson Nano, and two servers, which are consisted of one GTX

Conclusion and future works

In federated learning, there is a high chance that the participating devices are configured with heterogeneous devices. The heterogeneity of the devices may cause a straggler problem in federated learning owing to their imbalanced computational performances. To address the straggler problem, we proposed a novel AMBLE scheme that adaptively adjusts the local mini-batch size and local epoch size for heterogeneous devices in federated learning with synchronous updates. Although there have been

CRediT authorship contribution statement

Juwon Park: Conceptualization, Investigation, Methodology, Writing – original draft. Daegun Yoon: Data curation, Visualization. Sangho Yeo: Formal analysis, Investigation, Software. Sangyoon Oh: Conceptualization, Funding acquisition, Supervision, Validation, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research was jointly supported by the Basic Science Research Program (2021R1F1A1062779) of the National Research Foundation of Korea (NRF) funded by the Ministry of Education, the supercomputing application department at Korea Institute of Science and Technology Information (KSC-2021-CRE-0363), and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-2018-0-01431) supervised by the IITP (Institute for Information

Juwon Park received his B.S in the Department of Software Engineering from the Ajou University in 2019. Currently, he is a Master student of the Department of Artificial Intelligence at the Ajou University. His research interests include big data, deep learning and distributed and parallel computing in heterogeneous environment.

References (35)

R. Carli et al.
A pi consensus controller for networked clocks synchronization
IFAC Proc. Vol.
(2008)
M.M. Amiri et al.
Federated learning with quantized global model updates
K. Bonawitz et al.
Practical secure aggregation for federated learning on user-held data
K. Bonawitz et al.
Practical secure aggregation for privacy-preserving machine learning
K. Bonawitz et al.
Towards federated learning at scale: system design
C. Chen et al.
Round-Robin synchronization: mitigating communication bottlenecks in parameter servers
J. Chen et al.
Revisiting distributed synchronous sgd
W. Dai et al.
High-performance distributed ml at scale through parameter server consistency models
J. Dean et al.
Large scale distributed deep networks
P. Han et al.
Adaptive gradient sparsification for efficient federated learning: an online learning approach

A. Hard et al.

Federated learning for mobile keyboard prediction

C. He et al.

Group knowledge transfer: federated learning of large cnns at the edge

Adv. Neural Inf. Process. Syst.

(2020)

G. Hinton et al.

Distilling the knowledge in a neural network

D.P. Kingma et al.

Adam: a method for stochastic optimization

J. Konečnỳ et al.

Federated optimization: distributed optimization beyond the datacenter

J. Konečnỳ et al.

Federated optimization: distributed machine learning for on-device intelligence

J. Konečnỳ et al.

Federated learning: strategies for improving communication efficiency

Cited by (12)

MiniPFL: Mini federations for hierarchical personalized federated learning
2024, Future Generation Computer Systems
Personalized federated learning trains personalized models tailored to meet individual client’s specific data distributions. However, global models often introduce irrelevant information into personalized models, reducing communication efficiency and accuracy. We propose MiniPFL, a bi-component framework that selectively prioritizes valuable clients for personalized learning. The first component, a shallow layer holds information that is similar across clients, making it ideal for managing traditional federated aggregations. The second component, a deep layer, harbors more personalized information and allows identifying beneficial clients through mini-federations. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-Imagenet datasets using the Resnet18 architecture in Pytorch demonstrated that MiniPFL reduces communication rounds by 30% and increases accuracy by 2.7% compared to state-of-the-art methods.
Proactive auto-scaling technique for web applications in container-based edge computing using federated learning model
2024, Journal of Parallel and Distributed Computing
Edge computing has emerged as an attractive alternative to traditional cloud computing by utilizing processing, network, and storage resources close to end devices, such as Internet of Things (IoT) sensors. Edge computing is still in its infancy, and resource provisioning and service scheduling remain research concerns. Kubernetes is a container orchestration tool in distributed environments. Proactive auto-scaling techniques in Kubernetes improve utilization by allocating resources based on future workload prediction. However, prediction models run on central cloud nodes, necessitating data transfer between edge and cloud nodes, which increases latency and response time. We present FedAvg-BiGRU, a proactive auto-scaling method in edge computing based on FedAvg and multi-step prediction by a Bidirectional Gated Recurrent Unit (BiGRU). FedAvg is a technique for training machine learning models in a Federated Learning (FL) model. FL reduces network traffic by exchanging only model updates rather than raw data, relieving learning models of the need to store data on a centralized cloud server. In addition, a technique for determining the number of Kubernetes pods based on the Cool Down Time (CDT) concept has been developed, preventing contradictory scaling actions. To our knowledge, our work is the first to employ FL for proactive auto-scaling in cloud and edge computing. The results demonstrate that the FedAvg-BiGRU method has a slightly higher prediction error than the load centralized processing mode, although this difference is not statistically significant. At the same time, it reduces the amount of data transmission between the edge nodes and the cloud server.
Special issue on Distributed Intelligence at the Edge for the Future Internet of Things
2023, Journal of Parallel and Distributed Computing
SignSGD with Federated Voting
2024, arXiv
Aggregation Methods Based on Quality Model Assessment for Federated Learning Applications: Overview and Comparative Analysis
2023, Mathematics
DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset
2023, arXiv

View all citing articles on Scopus

Daegun Yoon received his B.S in the Department of Software Engineering from the Ajou University in 2018. Currently, he is a Ph.D. student of the Department of Artificial Intelligence at the Ajou University. His research interests include highly-scalable graph processing, general-purpose computing on GPUs (GPGPU), and high-performance computing (HPC).

Sangho Yeo received his B.S in the Department of Software Engineering from the Ajou University in 2017. Currently, he is a Ph.D. student of the Department of Artificial Intelligence at the Ajou University. His research interests include deep reinforcement learning, distributed deep learning, federated learning with heterogeneous devices, and distributed and parallel computing.

Sangyoon Oh received the PhD degree from the Computer Science Department, Indiana University, Bloomington, Indiana. He is currently a professor of the School of Information and Computer Engineering, Ajou University, South Korea. Before joining Ajou University, he worked for SK Telecom, South Korea. His main research interests include high-performance computing, distributed deep learning, and graph data processing.

View full text

AMBLE: Adjusting mini-batch and local epoch for federated learning with heterogeneous devices

Highlights

Abstract

Introduction

Section snippets

Federated learning

AMBLE for heterogeneous devices

Experiment setup

Conclusion and future works

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

IFAC Proc. Vol.

Federated learning with quantized global model updates

Practical secure aggregation for federated learning on user-held data

Practical secure aggregation for privacy-preserving machine learning

Towards federated learning at scale: system design

Round-Robin synchronization: mitigating communication bottlenecks in parameter servers

Revisiting distributed synchronous sgd

High-performance distributed ml at scale through parameter server consistency models

Large scale distributed deep networks

Adaptive gradient sparsification for efficient federated learning: an online learning approach

Federated learning for mobile keyboard prediction

Group knowledge transfer: federated learning of large cnns at the edge

Adv. Neural Inf. Process. Syst.

Distilling the knowledge in a neural network

Adam: a method for stochastic optimization

Federated optimization: distributed optimization beyond the datacenter

Federated optimization: distributed machine learning for on-device intelligence

Federated learning: strategies for improving communication efficiency