Fedrtid: an efficient shuffle federated learning via random participation and adaptive time constraint

Yang, Qiantao; Du, Xuehui; Wu, Xiangyu; Wang, Wenjuan; Liu, Aodi; Wang, Shihao

doi:10.1186/s42400-024-00293-x

Research
Open access
Published: 22 December 2024

Fedrtid: an efficient shuffle federated learning via random participation and adaptive time constraint

Qiantao Yang^1,2,
Xuehui Du²,
Xiangyu Wu²,
Wenjuan Wang²,
Aodi Liu² &
…
Shihao Wang²

Cybersecurity volume 7, Article number: 76 (2024) Cite this article

606 Accesses
Metrics details

Abstract

Federated learning is a promising new distributed machine learning paradigm, where the client realizes secure and collaborative multi-user training of machine learning models by retaining private data and sharing model parameters with the server. However, with the frequent interaction of model parameters between the client and the server, the client will consume a large amount of network and arithmetic resources, and resource-constrained clients can hardly maintain model security while ensuring the efficiency of collaborative user training. Therefore, we propose FedRtid, a shuffle differential privacy federated learning scheme with random participation and adaptive time constraints, to improve the efficiency of collaborative user training while considering model privacy. First, in model training, the participating clients have the right to decide on random participation in training locally and independently, to alleviate the user’s resource constraints and reduce the time of user interaction to train the model, while adding differential noise to the shared model parameters to ensure model security. In addition, to avoid the global model security decline of server aggregation due to fewer clients participating in training, and the model accuracy decline caused by adding differential noise to all model parameters, we constructed user sparsification and adaptive time-constrained shuffle techniques to reduce the number of model parameters to which the user adds noise, and enhance the model security. Under two types of data distributions, independently and identically distributed and non-independently and identically distributed, we conduct a large number of experiments on three real datasets, and the results show that FedRtid can effectively balance the accuracy and privacy of the model.

Introduction

Federated Learning (FL) is a new distributed collaborative learning paradigm, which opens up a new path for solving the problems of data silos and privacy leakage ( McMahan et al. 2017). During FL training, local users participate in training train models independently using private datasets, sharing the trained local models with the server. Once all user models are collected, It aggregates all local models and feeds the new global model to the users. With this collaborative training approach, where data is not shared and model parameters are shared, FL realizes the effective use of users’ private data and privacy protection and shows strong practicality in real-world application scenarios such as smart healthcare, finance, and autonomous driving ( Bai et al. 2021; Yang et al. 2019; Salehi et al. 2022; Qu et al. 2022).

As the number of federated trained user devices increases, both the exit of user devices and the increase in the number of shared model parameters can result in privacy leakage issues ( Wu et al. 2024a, b). For example, by analyzing a large number of model parameters associated with the user’s interactions with the server, an attacker with relevant background knowledge may be able to obtain sensitive information about a user ( Papernot et al. 2017; Nasr et al. 2019; Zhu et al. 2019).To cope with this privacy problem, Differential Privacy (DP) has been introduced in federated learning to enhance the security of user models ( Dwork 2006). By adding differential noise to the model parameters shared by the users, It can realize the privacy protection of models and indicate the degree of security of the models with the help of the size of the privacy budget. With this simple implementation mechanism and measurable privacy advantage, differential privacy has been favored by a large number of scholars and companies, such as Google ( Erlingsson et al. 2014), Apple ( Team 2017), and Microsoft ( Ding et al. 2017).

Currently, differential privacy federated learning research is mainly classified into central differential privacy federated learning (DP-FL), local differential privacy federated learning (LDP-FL), and shuffle differential privacy federated learning (SDP-FL), which can also be called shuffle federated learning. DP-FL ( McMahan et al. 2018; Geyer et al. 2017) possesses high accuracy but overly relies on server trustworthiness. LDP-FL ( Wang et al. 2019; Liu et al. 2020) shares the model weights before the users have accomplished the privacy protection of the local model and do not need to consider the server trustworthiness, but the model accuracy is lower. To effectively balance the accuracy and privacy of the model, SDP-FL ( Bittau et al. 2017; Erlingsson et al. 2020; Ghazi et al. 2019, 2021; Liu et al. 2021; Scott et al. 2022) has been developed. Unlike DP-FL, SDP-FL returns the parameter perturbation process to the user, thus eliminating the user’s dependence on server trustworthiness. In addition, SDP-FL compensates for the reduced usability of model parameters in LDP-FL due to user-added noise using a randomized disambiguation mechanism ( Cheu et al. 2019; Erlingsson et al. 2019; Balle et al. 2019) and a subsampling mechanism ( Balle et al. 2018; Yang et al 2023). This change in SDP-FL not only improves the privacy of the user’s model parameters but also ensures that the user only needs to add a small amount of noise to achieve a satisfactory level of privacy.

However, in complex federated learning training with hundreds or thousands of users, it is difficult for the existing differential privacy federated learning scheme to weigh the collaborative training efficiency and privacy of the model ( Girgis et al. 2021c; Feldman et al. 2021; Liew et al. 2022). The main reason for this phenomenon is the imbalance of computing, storage, communication, and other resources. During the security model training, it is difficult for users with limited resources to maintain frequent interaction of massive model parameters, which in turn reduces the efficiency of model collaborative training because the collaborative training efficiency between the server and the user depends on the slowest user. While users consume a large number of resources for local model training and model interaction, due to the lack of computing and storage resources, users cannot use the remaining resources to complete the differential protection of massive model parameters, which leads to a decline in model security. To solve these challenges, we propose a shuffle federated learning scheme FedRtid through random participation and adaptive time constraints, which constructs a Top-k sparsity random sampling mechanism by enhancing the power of users to participate in collaborative training independently, reducing the number of model parameters shared by users with noise, and realizing the efficiency of model interactive training. At the same time, an adaptive time-constrained shuffling model is established to ensure that the federated learning model can effectively balance the efficiency of model collaborative training and model privacy. Our contributions are summarized as follows:

In this paper, we propose an efficient shuffle federated learning framework FedRtid through random participation and adaptive time constraints, which effectively balances the collaborative training efficiency and model privacy of federated learning models. In the process of federated learning, by enhancing the autonomy of users to participate in training, that is, each user can independently and randomly decide whether to participate in each training after considering their situation. At the same time, the Top-k sparsity technology is used to select important model parameters to complete the differential noise addition process, which reduces the amount of noise added by users to each dimensional model parameter and enhances the efficiency and safety of user collaborative training of the model.
To avoid the problem of fewer clients participating in training leading to global model security slips and network limitations in server aggregation. Introduce a shuffler between the user and the server and set an adaptive shuffle wait time. The shuffler adaptively extends the shuffling waiting time to the maximum limit according to the number of user models received and determines whether to add virtual models based on the number of models, which not only enhances the security of the model but also further improves the efficiency of user collaborative training of the model.
To ensure the authenticity and reliability of the experimental results, we divided the three real datasets MNIST, Fashion-MNIST (FMNIST), and CIFAR-10 into independent identical distributions (IID) and non-independent identical distributions (Non-IID). Under these data distributions, we compare FedRtid with four other commonly used federated learning schemes and FedRtif (a full-parameter perturbation and fixed-time constrained federated learning scheme, algorithm 1) in a large number of experiments. The experimental results show that FedRtid can effectively balance the privacy of the model and the efficiency of collaborative training.

The rest of this article is organized as follows. In the next section, we describe the work of the differential privacy federated learning scheme. In the preliminaries section, we briefly introduce the definition, theory, and lemma of differential privacy and shuffle models, to facilitate the understanding of the theoretical explanations in this article. In the FedRtid framework design section, we elaborate on the model framework and technical implementation of FedRtid. In the experiments section, we use five commonly used differential privacy federated learning models to compare experiments with our proposed scheme. Finally, we summarize this paper and discuss the content of future research.

Related works

In 2006, Prof.Dwork first proposed differential privacy (Dwork 2006), the core idea of which is to add random noise to user data to complete the obscuration of personal privacy information, so that attackers cannot reverse the original information of users. DP has attracted the attention of a large number of scholars due to its advantages of small privacy computational cost and quantitative data privacy protection intensity. Meanwhile, it is a commonly used privacy protection technology in federated learning.

DP-FL

Abadi et al. (2016) proposed for the first time a stochastic gradient descent algorithm based on differential privacy, which completes the sensitivity limitation of model parameters through shearing technology, and then completes the differential noise addition of model parameters in the process of updating the local model, and creatively proposes the Moments Accountant method to record the change of model privacy budget. Geyer et al. (2017) designed a user-oriented differential privacy federated optimization learning method, which takes the user as the main body and improves the privacy of the model by adding differential noise to each model parameter, the amount of noise introduced itself reduces the training speed of the model. Truex et al. (2019) used differential privacy techniques to accomplish user model parameter perturbation and homomorphic encryption techniques to accomplish encrypted transmission of model parameters and model aggregation, the encryption and decryption process imposes additional computational overhead on the federated learning system. Wu et al. (2020) proposed an optimized DP-SGD scheme that accelerates the adaptive cost of model convergence speed through differential privacy gradients and intertwines with privacy budgets to quantify the performance of differential noise models, but neglects the server trustworthiness.

LDP-FL

To reduce the credibility of the federated learning center server, some research work has proposed a local differential federated learning scheme, which effectively avoids the credibility of the central server, e.g., Wang et al. (2019) proposed a novel LDP mechanism, which reduces the noise added by model parameters according to the method of collecting numerical attributes, and extends these mechanisms to multi-dimensional data tasks that can contain numerical and categorical attributes. Liu et al. (2020) designed a two-stage differential federated learning scheme FedSel according to the effect of model gradient parameters, which only adds differential noise to the gradient parameters of the model with a large role, thereby reducing the amount of noise added by the model and effectively improving the model performance. In the case of high-dimensional model parameters, incorrect model parameter selection is easy to prolong the model convergence time. Wei et al. (2020) designed a learning scheme to adaptively reduce the privacy budget according to the amount of user data, ensured the privacy of the user model, and improved the model accuracy by controlling the amount of random noise added. Truex et al. (2020) proposed a formal local differential federated learning privacy protection scheme LDP-Fed, which reduces the privacy loss through the selection and filtering of user models, but continuous model iterative training will lead to a decline in model performance.

SDP-FL

To effectively balance the accuracy and privacy of the model, shuffle differential privacy federated learning has gradually come to the forefront of researchers’ attention. Girgis et al. (2021c) proposed an efficient communication scheme that achieves efficient gradient aggregation for iterative training of the model by private mean estimation over several p-spaces. Feldman et al. (2021) perform shuffle privacy amplification operations on only n data records input to the local randomizer, achieving an even better privacy guarantee. Balle et al. (2020) propose a learning model in which clients participate in training independently under the assumption that only one sample is stored per client, the scheme relies on a trusted data parser. Girgis et al. (2021b) and Girgis et al. (2021a) allow each client to have multiple data samples, but the number of samples used for training is sampled fixedly, e.g., in each iteration of training (Girgis et al. 2021a) randomly samples one piece of data from the local data, Girgis et al. (2021b) randomly samples multiple pieces of data at a fixed data length. Sun et al. (2021) proposed a practical local differential federated learning scheme, LDP-FL, which circumvents the privacy budget explosion in the high-dimensional case by splitting and rearranging the parameters of the user model, reduces the differential noise variance, and improves the model accuracy. Liew et al. (2022) proposed a practical scenario for applying the shuffle check-in mechanism, which improves the privacy guarantee by a factor of three over the model using approximate differential privacy, but lacks consideration of communication time. Liu et al. (2021) proposed a double privacy amplification mechanism that effectively achieves improved model accuracy, but ignores the problem of uneven resources to the users.

These existing differential privacy federated learning schemes mentioned above effectively trade off model privacy and accuracy, but lack consideration for resource-constrained users. Under resource-constrained conditions, the computation and transmission of massive model parameters will consume more resources, and it is difficult for resource-constrained users to maintain the privacy and security of the model while keeping the model efficiently and collaboratively trained. Moreover, these existing schemes do not take into account that resource-constrained users will affect the efficiency of collaborative model training because the training process depends on the slowest one, and resource-constrained users have difficulty in using additional resources for maintaining interactive communication with the server while maintaining collaborative model training. As a result, we propose a randomized participation and adaptive time-constrained shuffle federated learning model (FedRtid). During the training of multi-user collaborative models, this scheme enhances the autonomy of users’ participation in training, i.e., each user can independently and randomly decide whether or not to participate in each training session after considering its resources. Once a user participates in the training, it will use the Top-k sparsification technique to select important model parameters to complete the process of adding differential noise, avoiding the noise added to all model parameters, which leads to resource-constrained users being unable to weigh model privacy and the efficiency of collaborative user training. Introducing a shuffle between the user and the server and setting the adaptive shuffling waiting time, according to the number of models received to adaptively extend the mixing waiting time until the maximum limit time, reducing the user due to the network environment caused by the loss of the model and the sharing of a long time, to achieve the efficiency of collaborative model training. At the same time, based on the number of models to determine whether to increase the virtual model, improve the security of a small number of models, and strengthen the privacy of the model. After the shuffling of the models will be upgraded from a model that meets the $\epsilon _l$ - differential privacy to a model that meets the $\epsilon _c$ - centered differential privacy, where $\epsilon _c < \epsilon _l$

Preliminaries

In this section, we introduce some basic terms, definitions, and combinatorial properties related to differential privacy and privacy amplification for the shuffle model.

Differential privacy

Definition 1

(Central Differential Privacy). For $\epsilon ,\delta \ge 0$, a random mechanism $M:{\mathcal {D}} \rightarrow {\mathcal {S}}$ on the neighboring datasets $D,D^\prime$ outputs any subset ${\mathcal {S}}$ that satisfies the following conditions, then the mechanism M is the one that satisfies $(\epsilon ,\delta )$-differential privacy($(\epsilon ,\delta )-DP$):

$$\begin{aligned} Pr[M(x)\in S]\le exp(\epsilon )Pr[M(x^\prime )\in S]+\delta \end{aligned}$$

Where, neighbor dataset means two datasets that differ by at most one record; The $\epsilon$ is the privacy budget (privacy parameter), which represents the degree of privacy protection, the smaller it is the higher the degree of privacy protection; $\delta (\delta \in [0,1])$ is the probability of differential privacy leakage risk. When $\delta = 0$, the mechanism M provides the strictest $\epsilon$-DP protection, also known as ”pure differential privacy” protection. While the $(\epsilon ,\delta )-DP$ mechanism can satisfy the protection of data, this traditional differential privacy requires a trusted data processor, which is often impractical, and this concern is successfully avoided by the local differential privacy. It is defined as follows:

Definition 2

(Local differential privacy). A random mechanism $R:{\mathcal {D}} \rightarrow {\mathcal {Y}}$ obtains the same output ${\mathcal {Y}}$ on any pair of inputs $x,x^{\prime }$ in ${\mathcal {D}}$ and the following inequality is satisfied, the mechanism R satisfies $\epsilon$-Local Differential Privacy($\epsilon -LDP$):

$$\begin{aligned} Pr[R(x)\in {\mathcal {Y}}]\le e^{\epsilon }Pr[R(x^\prime )\in {\mathcal {Y}}] \end{aligned}$$

DP is an excellent privacy protection tool, which has its combination of attributes not only gained widespread attention but also realized the generalization of DP and LDP, and its related combination of attributes are as follows Dwork and Roth (2014):

Lemma 1

Sequential Composition. There are n algorithms $M_1,..., M_n$ that satisfy the differential privacy mechanism with privacy budgets $\epsilon _1,...,\epsilon _n$. These algorithms combine to form a new mechanism ${\mathcal {M}}_{1-n}=(M_1,...,M_n)$ will satisfy will satisfy $sum_{i=1}^{n}\epsilon _i$-DP.

Lemma 2

Parallel Composition. there are n algorithms $M_1,..., M_n$ that satisfy the differential privacy mechanism with privacy budgets $\epsilon _1,...,\epsilon _n$. These algorithms are combined into a new mechanism ${\mathcal {M}}_{1-n}=(M_1,...,M_n)$ will satisfy max($\epsilon _1,...,\epsilon _n$)-DP.

Lemma 3

Advanced Composition. For all $\epsilon ,\delta ,\delta ^\prime > 0$, under the k-fold adaptive combination, the group $(\epsilon ,\delta )-DP$ mechanism will satisfy the $(\epsilon ^\prime ,k\delta +\delta ^\prime )-DP$ mechanism, where $\epsilon ^\prime = \sqrt{2kln(\frac{1}{\delta ^\prime })}\epsilon + k\epsilon (e^{\epsilon } -1)$

Lemma 4

Post Processing. Given any randomized algorithm M1 that satisfies $\epsilon$-DP, for any randomized algorithm M2 that does not need to satisfy the differential privacy mechanism, the new randomized algorithm M(D) = M2(M1(D)) that is the combination of M2 and M1 will also satisfy $\epsilon$-DP.

Shuffle model

The shuffle model is a distributed computing model, which consists of three parts: ${\mathcal {P}} = {\mathcal {A}}\circ {\mathcal {S}}\circ {\mathcal {R}}$, they are analyzer, shuffler, local randomizer. Suppose there are n users involved in training, their data can be represented as a dataset $X = (d_1,...,d_n) \in {\mathcal {D}}^n$. During training, each user perturbs their data into m messages satisfying $\epsilon _l-LDP$ using a local randomizer ${\mathcal {R}}:d_1 \rightarrow Y^m$. Here we pay more attention to the case of a single message, i.e. m=1. Then, each user sends its report to the shuffler ${\mathcal {S}}: Y \rightarrow Y^*$, where a random ordering of all reports is performed for anonymization purposes. Finally, the server receives the reports from the shuffler and analyzes them ${\mathcal {A}}: Y^* \rightarrow Z$.

Considering that ${\mathcal {A}}$ is an untrusted analyzer, by the Lemma 4 property of differential privacy, it is only necessary to ensure that ${\mathcal {M}} = {\mathcal {S}}\circ {\mathcal {R}}^n$ is consistent with $(\epsilon _c,\delta _c)-DP$, then ${\mathcal {P}}$ and ${\mathcal {M}}$ will achieve the same level of privacy guarantees. When $\epsilon _c < \epsilon _l$, ${\mathcal {M}}$ will be more guaranteed privacy, which is also known as “privacy amplification”. Compared to the LDP-FL, the SDP-FL requires only a small amount of added noise to achieve the same level of privacy guarantee. Among the extant work on privacy amplification, the privacy blanket proposed by Balle et al. (2019) is an excellent stochastic response privacy amplification, theorem as follows:

Theorem 1

If $\sqrt{\frac{14log(2/\delta _c)(b-1)}{n-1}} < \epsilon _c \le 1$ and $\mathcal {R_{r,b}}$ satisfies $\epsilon _l-LDP$, for ${\mathcal {M}} = {\mathcal {S}} \circ {\mathcal {R}}^n$, We will get $(\epsilon _c,\delta _c)$, where $\epsilon _c = \sqrt{\frac{14log(2/\delta _c)(e^{\epsilon _l}+ b - 1)}{n-1}}$

Where $\epsilon _l,\epsilon _c$ represent the local differential privacy budget and the central differential privacy budget, respectively, b denotes the discrete domain, both the input value x is encoded by the local randomizer into the discrete domain [b] and randomized to the output, and $\gamma = \frac{b}{e^{\epsilon _l}+b-1}$ denotes the probability of output from the discrete domain an element’s probability, and n represents the number. To bound the privacy amplification bounds, Balle et al. (2019) gives the final privacy amplification bounds in corollary1.

Corollary 1

In shuffle model, if ${\mathcal {R}}$ is $\epsilon _l-LDP$, where $\epsilon _l \le log(\frac{n}{log(1/\delta _c)})/2$. The ${\mathcal {M}}$ will satisfy $(\epsilon _c,\delta _c)-DP$ and $\epsilon _c = {\mathcal {O}}((1\wedge \epsilon _l)e^{\epsilon _l}\sqrt{log(1/\delta _c)/n})$

Furthermore, privacy amplification of the shuffle model has also been achieved using the subsampling mechanism (Balle et al. 2018, 2020) with the following theory:

Theorem 2

(Privacy Amplification by Subsampling) Sampling m records from the set of n records to form a new set with a non-replacement sampling relation, and the mechanism $M:{\mathcal {D}}^m \rightarrow {\mathcal {Y}}$ satisfies $(\epsilon ,\delta )-DP$, then the mechanism $M^{\prime }:{\mathcal {D}}^n \rightarrow {\mathcal {Y}}$ satisfies $\epsilon ^{\prime } = log(1+\frac{m}{n}(e^{\epsilon } - 1)),\delta ^{\prime } = (\frac{m}{n}\delta )$

FedRtid framework design

In this section, we present the technical details of the design of FedRtid, a mix-and-shuffle federated learning framework, concerning the overall framework design of FedRtid, the client-user stochastic participation mechanism, Top-k sampling, and adaptive mix-and-shuffle constraints. We also analyze the underlying privacy guarantees of the model itself in a formal language.

FedRtid framework

As depicted in Fig. 1, FedRti consists of three components, which are the local randomizer (${\mathcal {R}}$), the shuffler (${\mathcal {S}}$), and the analyzer (${\mathcal {A}}$), following the mechanism of the underlying shuffler model operation. 1) Local randomizer ${\mathcal {R}}$: suppose N users are participating in the training, and each user has a d-dimensional local model w, during the t-th iteration of model training, client i will independently and randomly decide to participate in this training, once it participates in this training, client i will utilize the local randomizer ${\mathcal {R}}_i: w_i^t \rightarrow w_i^{t,*}$ to perturb the local model $w_i^t$ into a safe local model $w_i^{t,*}$, and then uploads $w_i^{t,*}$ to the shuffler ${\mathcal {S}}$. 2)Shuffler ${\mathcal {S}}$: within the specified constraint time Ti, ${\mathcal {S}}$ will keep receiving the perturbed model $w_i^{t,*}$ uploaded by the client, once the constraint time is over the shuffler will perform a random shuffle operation on the received model and send the disrupted model to the analyzer ${\mathcal {A}}$. 3)Analyzer ${\mathcal {A}}$: after ${\mathcal {A}}$ receives all the shuffling models, it will analyze and evaluate them, and aggregate the processed valid models $w_i^{t},i\in [N]$ into the $t+1$-th round of the global model $w^{t+1}$ and broadcast it out, which is aggregated in the form of $w^{t+1} \leftarrow w^t + \frac{1}{n}\sum _i^n w_i^{t}$.

Randomized participation mechanisms for client users

In traditional federated learning, all clients participating in federation training share their model parameters with the server to complete the multi-user collaborative training process of federation learning. However, users will consume more resources such as computation, storage, bandwidth, etc. to complete the normal sharing and privacy protection of large-scale model parameters, and clients with insufficient computation, storage, communication, etc. capabilities can hardly take care of model parameter sharing and privacy protection at the same time, which affects the efficiency of collaborative training of the model and privacy. Therefore, we propose a user random sampling algorithm, which improves the efficiency and model privacy of multi-user collaborative model training by enhancing the user’s autonomous choice, and the specific execution process is shown in the algorithm 1.

In training round t-th, the analyzer ${\mathcal {A}}$ broadcasts the global model $w^t$ to all client users at line 3 of the algorithm 1 and asks them to participate in this learning. Assume that during user sampling, each user independently flips a biased coin with probability p and decides to participate in this round of training only when heads are returned. For simplicity, we refer to the probability p that a user successfully participates in training as the randomized participation success rate, and the probability obeys the Bernoulli distribution Bern(p). Under the uncertainty of various user resources such as network, computation, storage, etc., users who successfully participate in training may also fail to complete the training or quit the training due to the lack of resources, which affects the final model communication efficiency and model security. To reduce the occurrence of this situation, in each round of training, the user will independently decide whether to participate in the current round of co-training according to their resource conditions, and we regard this decision as the probability of the user independently withdrawing from the training $p^{\prime }$, so the new random participation success rate $\beta$ can be expressed as $\beta =p(1-p^{\prime })$.

When client i decides to engage in training, it will utilize the global model $w^t$ from the t-th round to complete the training on the private dataset $D_i$ and perform model parameter trimming and processing on the d-dimensional model $w_i^t$ it has trained (lines 10-11), where C is the trimming threshold. In line 13 of the algorithm 1, client i adds Laplace or Gaussian noise to its model $w_i^t$ using the local randomizer $R_{\gamma }^{d}$ and generates a hidden ID symbol $id_{li}$, for determining the validity of $w_i^t$. Subsequently, the client transmits the model $w_i^{t,*}$ and the valid model identification number $id_{li}$ to the shuffle ${\mathcal {S}}$ after the perturbation. By this method of improving the quality of participating training users with increased noise, the imbalance between the efficiency and privacy of collaborative model training caused by the resource constraints of users is effectively mitigated. Next, the shuffle ${\mathcal {S}}$ completes a random shuffling operation on the received models and sends the shuffled data to the untrusted analyzer ${\mathcal {A}}$. At last, the analyzer ${\mathcal {A}}$ analyses and corrects the random permutation data uploaded by the shuffle. It then aggregates the valid user model parameters $w_i,i\in [b\beta ]$ into a new global model $w^{t+1} \leftarrow w^t + \frac{1}{n\beta }\sum _{i=1}^{n\beta }w_i$ (lines 18-21). To analyse the privacy guarantees implemented by the algorithm 1, we can summarise the new operating mechanism ${\mathcal {P}} = {\mathcal {A}}\circ {\mathcal {S}}\circ {\mathcal {R}}_{\gamma }^{d}\circ \mathcal{P}\mathcal{S}^{\beta }$. Given that the primary function of the ${\mathcal {A}}$ is to analyze the user model parameters, combined with the of Lemma 4, only the ${\mathcal {M}}={\mathcal {S}}\circ {\mathcal {R}}_{\gamma }^{d}\circ \mathcal{P}\mathcal{S}^{\beta }$ is needed to satisfies differential privacy.

Assume that each client’s user model uses a total privacy budget of $\epsilon _l$. When client i successfully participates in training, it performs ${\mathcal {R}}_{\gamma }^{d}$ to complete the user model parameter perturbation process, and guarantees that each model parameter of the client after the perturbation will satisfy the $\epsilon _{dl}-LDP$ privacy guarantee, where $\epsilon _{dl} = \epsilon _{l}/d$. Since all parameters of $w_i^t$ are in a combination, the $\epsilon _{dl}$ privacy guarantee can be obtained by combining inversely the serial combination property in lemma 1. Thus, a client’s execution of a local parameter perturbation ${\mathcal {R}}_{\gamma }^{d}\circ \mathcal{P}\mathcal{S}^{\beta } = {\mathcal {R}}(w_i^t,\epsilon _l)\circ \mathcal{P}\mathcal{S}^{\beta }$ can be viewed as $w_i^{*} \leftarrow R_{\epsilon _{dl}}(w_{i}^t,ci_1),..., R_{\epsilon _{dl}}(w_i^t,ci_i), i \in [d]$, where $ci_i$ represents the index of client i in the d-dimensional model. When the ${\mathcal {S}}$ receives the perturbed model parameters shared by the client user, the ${\mathcal {S}}$ performs a random permutation of all the parameters of the model, disrupting the profile link between the user and the model, ensuring that the parameter shuffling process meets the requirements of the shuffled model, and realizing the enhancement of model privacy. After the client-side perturbation and the random arrangement of the shuffler, the whole running process of the algorithm 1 will satisfy the privacy guarantees of the Theorem. Thus it can be concluded that the privacy level of the algorithm 1, running the mechanism ${\mathcal {M}} = {\mathcal {S}}\circ {\mathcal {R}}_{\gamma }^{d}\circ \mathcal{P}\mathcal{S}^{\beta }$, will be up to $(\epsilon _{ cs},\delta _{cs})-DP$. In addition, according to the probability of leaking information for differential privacy, the user should guarantee that $\delta _{cd}<2\beta$ during the execution of local training, which is a very reasonable setting since the case of calculating $\delta _{cd}\ll \frac{1}{nd}$ according to the standard is negligible. Thus, the algorithm 1 will fulfill the privacy guarantee in Theorem 3.

Theorem 3

Under $\gamma =\frac{b}{\epsilon ^{dl}+b+1},\delta _{cd}<2\beta$, for any neighboring datasets $D, D^\prime$ and which are distinct in the d-dimensional local vectors of a user, the mechanism ${\mathcal {M}} = {\mathcal {S}}\circ {\mathcal {R}}_{\gamma }^{d}\circ {\mathcal {P}}^{\beta }$ will satisfy $(\epsilon _{cs},\delta _{cs})-DP$.

$$\begin{aligned} \epsilon _{cs} = \sqrt{\frac{14log(2/\delta _{cd})(e^{\epsilon _{dl}}+ b - 1)}{d -1}} \end{aligned}$$

(1)

Since clients decide whether or not to participate in training based on the random probability of success $\beta$, $\mathcal{P}\mathcal{S}^{\beta }$ is the mechanism by which clients independently participate in training with probability $\beta$. In the ideal expectation, at most $N\beta$ clients will choose to participate in each round of training, and this approach also satisfies the sub-sampling privacy amplification required by Theorem 2. Therefore, the privacy guarantee of the algorithm 1 will be further improved to $(\epsilon _{cd},\delta _{cd})-DP$, where $\epsilon _{cd} = log(1+\beta (e^{\epsilon _{cs}} - 1))$.

After the above local random perturbation, subsampling differential privacy amplification, and shuffling, the user model can be obtained with $\epsilon _{cd}$-DP guarantees on the parameter dimensions. When the user model parameters are combined from the dimensional level to the vector level, i.e., a complete user model, according to the combinatorial nature of the Lemma 2 and the Lemma 3, the algorithm 1 will achieve the final centered privacy of $(\epsilon _c,\delta _c)-DP$ at the vector level guarantee, where $\epsilon _{c}=d\epsilon _{cd}\wedge (\epsilon _{cd}\sqrt{2dlog(\frac{1}{\delta _{cd}})} + d\epsilon _{cd}(e^{\epsilon _{cd}}-1))$,$\delta _{c}=\beta \delta _{cd}(d+1)$. In conjunction with the contents of the Corollary, the algorithm 1 will also implement the amplification of the privacy budget $\epsilon _l$ to the privacy budget $\epsilon _c$ in Corollary 2.

Corollary 2

Under $\epsilon _l \le d\cdot \log {(\beta n/log((d+1)/\delta _c)}/2$, the enlarged difference privacy $\epsilon _c$ will be as follows:

$$\begin{aligned} \epsilon _c&=log(1+\beta (e^{{\mathcal {O}}((1\wedge \epsilon _{dl})e^{\epsilon _{dl}}log(d/\delta _c)/\sqrt{d/n})}-1)) \end{aligned}$$

(2)

Top-k sampling and adaptive shuffling constraints

The algorithm 1 effectively balances the efficiency of collaborative user training and model privacy by randomly augmenting the quality of participating training users. However, when the communication environment is restricted, the model uploaded by the client is likely to take more time or even be lost during transmission, which affects the client–server communication waiting time. In addition, since client users sample perturbations to all dimensional parameters of the model, server aggregation of perturbed model parameters from a small number of users is also prone to degradation of model performance. For this reason, we further propose a new mechanism based on the algorithm 1 for user-local perturbation machine mixing and washing to reduce the waiting time of client users and improve the model quality. These two mechanisms are model parameter random k-sampling perturbation mechanism and adaptive constraints shuffling mechanism, respectively, and the specific implementation process is shown in algorithm 2 and algorithm 3.

Given that the client model is trained on its unique data, the larger the absolute value of the model parameter gradient during model training, the greater the convergence contribution to the user model. Specifically, the larger the gradient value of the model, the smaller the convergence of its parameters, and the farther the model is from the convergence process, the more noise it can accommodate. Conversely, the smaller the value of the model parameter gradient, the more stable its parameters are, and the more sensitive it is to the amount of added noise. Therefore, we designed the algorithm 2. When the user starts to enter the local random perturbation, he will run $val_{idx}\leftarrow flattened(w_i^t)$ to get all the parameter values in the D-dimensional model $val_{idx}$, and run $idx_{tk} \leftarrow sort(abs(val_{idx}))[:tk]$ to get the most important k model parameter values in the user-trained model (k$<<$d). Then, the user performs dimension traversal on the model parameters, adds reasonable differential noise to the model parameter values in $idx_{tk}$, and uses the model parameter $w_i^{*}$ perturbed by ${\mathcal {R}}$ as the shared model for this round. At the same time, the client user will also generate a special ID to prove that the model shared by itself in this round of training is valid, encrypt this ID with $Enc(\cdot )$, and share the encrypted ID with $w_i^{*}$ to the shuffler. Through this random perturbation method, users can not only avoid the risk of privacy leakage caused by the inability to add differential noise to the model parameters but also devote more resources to maintaining the interaction of model parameters, thereby improving the efficiency of model collaborative training. With this randomized perturbation approach, the user not only avoids the risk of privacy leakage caused by the inability to add differential noise to the model parameters but also spends more resources on improving the efficiency of collaborative model training. Moreover, this approach will also reduce the privacy budget after the local model aggregation, and allow the K model parameters with perturbations to benefit from the new privacy budget $\epsilon _{kl}=\frac{\epsilon _l}{k}$. This is because the perturbed model parameters will add less noise with the total privacy budget $\epsilon _l$ unchanged.

To cope with the problem that the models shared by clients cannot be uploaded to the shuffler promptly due to network reasons, and the security of the shuffled models caused by them decreases, we have improved the shuffler, as shown in the algorithm 3. Assuming that the time it takes for a client user to complete the t-round model sharing is $Ti_{t}$ in a slightly congested environment, the minimum and maximum wait time for the shuffler to submit the shuffle weight can be set to $2Ti_{t}$ and $n\beta Ti_{t}$. When the shuffler receives a valid client model, it dynamically adds a $Ti_{t}$ to the minimum shuffle constraint time to extend the shuffler’s constraint time. If the shuffler exceeds the final submission time $n\beta Ti_{t}$ and the number of customer weights received does not meet the minimum number of customers required for this round of $n\beta$. The shuffler will replenish the remaining quantity with the global model $w^t$ and generate an embedded label ID $id_{si}$ for the invalid model to meet the minimum number of shuffling models. As described by the algorithm 3, when the shuffler virtually replenishes the number of models, it generates an invalid model of the client that is not involved in the training represented by the embedding ID. After the constraint time has expired, the shuffler randomly scrambles the valid and invalid models, cuts the potential connections between the model and the user, and uploads the unordered model to the profiler. Since the analysis and aggregation process of the analyzer is consistent with the description of the algorithm 1 when the analyzer receives the user model, it will decrypt the ID of the model parameters uploaded by the shuffler, and then exclude invalid models to ensure that the new round of global model update process is not affected. In this way, FedRtid can not only effectively improve the efficiency of collaborative training, but also avoid the impact of the random virtual filling data on the global model accuracy, and also realize the effective trade-off between model collaborative training efficiency and model invisibility.

To easily analyze the privacy guarantee implemented by the FedRdid model, we can summarize the new operating mechanism ${\mathcal {P}} = {\mathcal {A}}\circ {\mathcal {S}}_{df}\circ {\mathcal {R}}_{\gamma ,d}^{tk}\circ \mathcal{P}\mathcal{S}^{\beta }$, where ${\mathcal {S}}_{df}$ is the number of df virtual models supplemented by the shuffler, ${\mathcal {R}}_{\gamma , d}^{tk}$ is the number of parameters of the local randomizer randomly extracting tk in the d-dimensional model to add differential noise model, $\mathcal{P}\mathcal{S}^{\beta }$ is the probability that the client will randomly participate in the training. Compared with the algorithm 1, the latter implements local double-random subsampling perturbation through ${\mathcal {R}}_{\gamma ,d}^{tk}\circ \mathcal{P}\mathcal{S}^{\beta }$, which ensures that the perturbed model parameters can use the new privacy budget $\epsilon _{kl} = \frac{\epsilon }{k}$, reduces the impact of noise on model performance, and ensures that users have more resources for user collaborative training. At the same time, we also set ${\mathcal {S}}_{df}$ to dynamically adjust the shuffling constraint time and add the number of virtual models, reduce the impact of the network environment on collaborative training, and further improve the model privacy. Except for the above, the rest of FedRtid is the same as the algorithm 1, so based on the privacy guarantee of the algorithm 1, we give the complete privacy amplification guarantee and vector-level privacy combination of the mechanism ${\mathcal {M}} = {\mathcal {S}}_{df}\circ {\mathcal {R}}_{\gamma ,d}^{tk}\circ \mathcal{P}\mathcal{S}^{\beta }$ in the Theorem.

Theorem 4

Under $\gamma =\frac{b}{\epsilon ^{kl}+b+1}$ and $\delta _{cd}<2\beta tk/d$, for any neighboring dataset $D, D^\prime$, and they are distinct on a user’s d-dimensional local vector, the mechanism ${\mathcal {M}} = {\mathcal {S}}_{df}\circ {\mathcal {R}}_{\gamma ,d}^{tk}\circ \mathcal{P}\mathcal{S}^{\beta }$ will satisfy $(\epsilon _{c},\delta _{c})-DP$:

$$\begin{aligned} \epsilon _{kl}&= 1+log(\frac{exp(\epsilon _l) - 1}{tk/d}) \end{aligned}$$

(3)

$$\begin{aligned} \epsilon _{cd}&= log(1+\beta (e^{\sqrt{\frac{14log(2tk/d\delta _{cd})(e^{\epsilon _{kl}}+ b - 1)}{\alpha d -1}}} - 1)) \end{aligned}$$

(4)

$$\begin{aligned} \epsilon _{c}&=tk\epsilon _{cd}\wedge (\epsilon _{cd}\sqrt{2tklog(\frac{1}{\delta _{cd}})} + tk\epsilon _{cd}(e^{\epsilon _{cd}}-1)) \end{aligned}$$

(5)

$$\begin{aligned} \delta _{c}&=\beta \delta _{cd}(tk+1) \end{aligned}$$

(6)

It should be noted that the model parameter data is being randomly sampled in only tk dimensions for perturbation and virtual model populations are being implemented in the shuffle section. In conjunction with the sampling rate $\alpha =\frac{tk}{d},\beta$, the privacy amplification bounds in the Corollary 2 can be further lifted to the Corollary 3.

Corollary 3

For FedRtid, under $\epsilon _l \le \alpha d\log {(\beta n/log((2\alpha d+\alpha )/\delta _c)}/2$, the amplified central differential privacy $\epsilon _c$ will look as follows:

$$\begin{aligned} \epsilon _c&=log(1+\beta (e^{{\mathcal {O}}((1\wedge \epsilon _{kl})e^{\epsilon _{kl}}\alpha ^{1.5}\sqrt{\frac{\alpha d}{\delta _c}})}-1)) \end{aligned}$$

(7)

Experiments

Experimental setup

To verify the true effectiveness of the scheme, we used three of the most popular datasets, MNIST, FMNIST, AND CIFAR-10, for experimental validation. Considering the intricate complexity and disorder of user data distribution in the real world, we divide the dataset into two data types: independent identically distributed (IID) and non-independent identically distributed (Non-IID) (Li et al. 2022), and randomly distribute the divided dataset to different users, as shown in Fig. 2. In addition, to better train various datasets, we design a convolutional neural network with a weight dimension $d=50618$ for MNIST and FMNIST datasets. For the CIFAR-10 dataset, a convolutional neural network with weight dimension $d=231562$ was designed to verify the performance of the model, and the average of the three experimental data was taken as the final result to ensure the authenticity of the experimental results.

To fully verify the validity of the FedRtid, we conducted experiments using 200 clients in a local Linux environment with 3 Geforce RTX3090TI. Meanwhile, to evaluate the FedRtid model more accurately and effectively, we compare FedRtid with FedRtif, a fixed shuffle model with full parameter perturbation of the algorithm 1, and five popular existing differential privacy federated learning models. The five models are FedAvg (McMahan et al. 2017), DP-FedAvg (Wu et al. 2020), LDP-FedAvg (Truex et al. 2020), SDP-FedAvg1 (Scott et al. 2022), and SDP-FedAvg2 (Liew et al. 2022).

Model performance analysis

To compare the impact of this scheme on model performance with other federated learning privacy preserving schemes under a uniform privacy budget, we set the total privacy budget $\epsilon _l$ to 506.18 for MNIST and FMNIST, and $\epsilon _l$ to 2315.62 for CIFAR-10, with the model sparsification parameter of 0.7, and a random user participation rate of 1.0. After the model completes one global training session, the model running the FedRtid scheme will amplify the model’s privacy to $(0.37,5e-6)-DP$. Figure 3 lists the model accuracies of the seven schemes on both IID and Non-IID data distributions. It is observed that the model accuracies achieved by FedRtid are all much higher than those of the DP-FedAvg1 scheme that satisfies the $(0.37,5e-6)-DP$ privacy guarantee, especially on MNIST, where FedRtid’s model accuracy is 0.34 and 0.30 higher, respectively.

Under the IID and Non-IID data distributions of the three datasets, the model performance of LDP-FedAvg with $(0.37,5e-6)-DP$ privacy guarantees is much lower than that of FedRtid, which is attributed to the Top-k sparsifying sampling mechanism that reduces the amount of noise added, and the adaptive blending mechanism that further amplifies the privacy guarantees and improves the global model quality of aggregation. In contrast, the LDP-FedAvg adds noise for all model parameters and does not amplify by blending. When the DP-FedAvg2 privacy guarantee is set to $(34.78,5e-6)-DP$, the model accuracy of DP-FedAvg2 will be larger than that of FedRtid and close to that of NP-FedAvg, but the privacy guarantee of FedRtid is 94 times higher than that of DP-FedAvg2. For SDP-FedAVg1, SDP-FedAVg2, and FedRtif with a privacy guarantee of $(2.68,5e-6)-DP$, FedRtid enhances the model performance by reducing the noise value added to the model parameters in each dimension. Taken together, the FedRtid algorithm can ensure better privacy and accuracy of the model while ensuring that the model has the same privacy budget.

Model privacy analysis

To analyze the impact of privacy budget on model performance and privacy. We give two values of $\epsilon _l$ in the table 1 and conduct experiments on two data distributions of three datasets, MNIST, FMNIT, and CIFAR-10. As shown in table 1, after bringing $\epsilon _l=506.18$ into the two protocol models, we can see that the privacy guarantee after the FedRtif amplification reaches $\epsilon _c=2.68$, while the FedRtid can achieve a stronger privacy guarantee $\epsilon _c=0.37$. Under the IID and Non-IID distributions of MNIST, the accuracy of the FedRtid model is 0.23 and 0.26 higher than that of the FedRtif, respectively. When the model finishes the validation on Cifar-10, we find that the amplification effect of privacy became more prominent, but the accuracy of the model decreased. The reason may be that the complexity of the model is high and the number of weight parameters is too large, resulting in the deviation between the weight sampling perturbation of fixed k sample and the real weight data, resulting in a slight decline in model accuracy. We argue that for more complex neural networks and datasets, although the perturbation method of fixed k sample loses the accuracy of some models, this safe scheme is also worth considering.

Table 1 The effect of $\epsilon _l$ on the model

Full size table

Model stability analysis

To explore the effect of random client participation in training on model performance, this chapter will participate in FedRtid training client users following the random participation ratio of 0.2, 0.4, 0.6, 0.8, and 1.0 for experimental verification, and the results are shown in Fig. 4. As the random participation ratio $\beta$ increases, the number of users who can participate normally will also gradually increase, and the model accuracy of FedRtid on multiple data distribution categories will also increase. When the client random participation ratio is 0.2 and 1.0, there is a maximum difference of 26.9% in the model accuracy that FedRtid can achieve after completing model training on the non-IID data distribution of FMNIST. This is because as more users participate in the training, the quality of the users is strengthened, which in turn improves the quality of the shared model parameters, reduces the impact of differential noise on the model aggregation performance, and ultimately improves the model quality.

Model collaboration efficiency analysis

To further measure the relationship between the collaborative training efficiency and model accuracy of FedRtid. Under the IID and Non-IID data distributions of the three datasets, we use visualization to show the computation time required to complete 30 rounds of model training for all the scenarios and the accuracy of the model implementations, and the results are shown in Fig. 5. By observing Fig. 5, it is found that FedRtid has good model collaboration efficiency, especially in Fig. 5c, FedRtid’s time loss is lower than that of LDP-FedAvg by 663min, and in terms of model accuracy, it is higher than LDP-FedAvg by 35.56% and 20.51%. As the complexity of the dataset rises, the advantage of FedRtid in collaborative model training will gradually increase, this is because more complex models are needed to complete the model training of CIFAR-10, and FedRtid only needs to add differential noise perturbation to the K model parameters, while LDP-FedAvg is selecting all the model parameters to add differential noise, which leads to the latter will use more resources to complete the perturbation process and the sharing process of the model parameters, and in the network environment with insufficient bandwidth resources, the latter will need more time to complete the interactive collaborative training task of the model, and some under-conditioned users can not even complete the transmission task of the model, which in turn will bring about an impact on the whole collaborative training process.

Compared with the seven schemes of DP-FedAvg1, DP-FedAvg2, NP-FedAvg, LDP-FedAvg, SDP-FedAVg1, SDP-FedAVg2, and FedRtif. In terms of time consumption, FedRtid is similar to DP-FedAvg1, DP-FedAvg2, SDP-FedAVg1, SDP-FedAVg2, and FedRtif, higher than NP-FedAvg, and lower than LDP-FedAvg. The main reason for this situation is that DP-FedAvg1 and DP-FedAvg2 completed the noise-adding task by the server, but the amount of added noise is different. NP-FedAvg has no noise-adding session. LDP-FedAvg adds more noise consuming more user resources. SDP-FedAVg1, SDP-FedAVg2, and FedRtif use shuffling, FedRtid selects only some of the model parameters for noise perturbation, which reduces the resource consumption of adding noise, so users running the FedRtid scheme can effectively improve the efficiency of collaborative model training. In terms of model accuracy, FedRtid is slightly lower than DP-FedAvg2 and NP-FedAvg, and higher than DP-FedAvg1, LDP-FedAvg, SDP-FedAVg1, and SDP-FedAVg2, which occurs mainly since DP-FedAvg1 and DP-FedAvg2 provide a centralized differential privacy mechanism, the security depends on the central server. NP-FedAvg does not provide any security mechanism. LDP-FedAvg, SDP-FedAVg1, and SDP-FedAVg2 add too much noise, and FedRtid not only provides a shuffle differential privacy guarantee but also ensures the model accuracy by reducing the amount of added noise. Overall, in complex data distribution scenarios and limited network environments, the FedRtid scheme can effectively improve the efficiency of collaborative model training while improving model accuracy and ensuring model security.

Conclusion

In this paper, we propose for the first time randomized participation and adaptive time constraints for the scrubber federated model FedRtid. The scheme improves the efficiency of collaborative model training for resource-constrained users on the one hand, and enhances the privacy of resource-constrained user’s models on the other hand, through the randomized participation of the client and the perturbation method of sampling the Top-k model parameters. Meanwhile, the adaptive dynamic time constraints of the scrubber and the number of virtual models filling ensure that FedRtid can effectively mitigate the long model training time due to network reasons, and the low privacy and poor model accuracy due to the small number of scrubbed models. Finally, we conducted experiments in various scenarios and compared FedRtid with some other state-of-the-art methods, which demonstrated that FedRtid can effectively balance the efficiency of collaborative model training and privacy, especially under the assumption of limited resources. our study also highlights the importance of studying the balance between privacy and model collaboration efficiency for federated learning models.

Additionally, our research program with other privacy-preserving methods needs further in-depth joint research, e.g., homomorphic encryption, and multi-party secure computation. This is of course one of the focuses of our future work. In the future, we also plan to apply the proposed FedRtid to other federated learning schemes, e.g., natural language models for secure and efficient text analysis. Meanwhile, we also wangt to design lightweight federated learning models using distillation techniques for better compatibility with users’ different device types and adaptation to extreme network conditions.

Availability of data and materials

All data generated or analyzed during this study are included in this paper.

References

Abadi M, Chu A, Goodfellow I, et al (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. Association for computing machinery, New York, NY, USA, CCS ’16, p 308-318
Bai X, Wang H, Ma L et al (2021) Advancing covid-19 diagnosis with privacy-preserving collaboration in artificial intelligence. Nature Machine Intelligence 3(12):1081–1089
Article Google Scholar
Balle B, Barthe G, Gaboardi M (2018) Privacy amplification by subsampling: tight analyses via couplings and divergences. In: Bengio S, Wallach HM, Larochelle H et al (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018. Montréal, Canada, pp 6280–6290
Google Scholar
Balle B, Kairouz P, McMahan B et al (2020) Privacy amplification via random check-ins. In: Larochelle H, Ranzato M, Hadsell R et al (eds) Advances in neural information processing systems, vol 33. Curran Associates Inc, New york, pp 4623–4634
Google Scholar
Balle B, Bell J, Gascón A, et al (2019) The privacy blanket of the shuffle model. In: Boldyreva A, Micciancio D (eds) Advances in cryptology - CRYPTO 2019 - 39th Annual international cryptology conference, Santa Barbara, CA, USA, August 18-22, 2019, Proceedings, Part II, Lecture notes in computer science, vol 11693. Springer, pp 638–667
Bittau A, Erlingsson Ú, Maniatis P, et al (2017) Prochlo: Strong privacy for analytics in the crowd. In: Proceedings of the 26th symposium on operating systems principles, Shanghai, China, October 28-31, 2017. ACM, pp 441–459
Cheu A, Smith AD, Ullman JR, et al (2019) Distributed differential privacy via shuffling. In: Ishai Y, Rijmen V (eds) Advances in cryptology - EUROCRYPT 2019 - 38th Annual international conference on the theory and applications of cryptographic techniques, Darmstadt, Germany, May 19-23, 2019, Proceedings, part I, lecture notes in computer science, vol 11476. Springer, pp 375–403
Ding B, Kulkarni J, Yekhanin S (2017) Collecting telemetry data privately. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’17, p 3574-3583
Dwork C (2006) Differential privacy. In: Bugliesi M, Preneel B, Sassone V et al (eds) Automata, languages and programming, 33rd international colloquium, ICALP 2006, Venice, Italy, July 10–14, 2006, Proceedings, part II, vol 4052. Lecture notes in computer science. Springer, pp 1–12
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3–4):211–407
MathSciNet Google Scholar
Erlingsson Ú, Feldman V, Mironov I, et al (2019) Amplification by shuffling: From local to central differential privacy via anonymity. In: Chan TM (ed) Proceedings of the thirtieth annual ACM-SIAM symposium on discrete algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019. SIAM, pp 2468–2479
Erlingsson Ú, Feldman V, Mironov I, et al (2020) Encode, shuffle, analyze privacy revisited: formalizations and empirical evaluation. CoRR abs/2001.03618
Erlingsson U, Pihur V, Korolova A (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. Association for computing machinery, New York, NY, USA, CCS ’14, p 1054-1067
Feldman V, McMillan A, Talwar K (2021) Hiding among the clones: a simple and nearly optimal analysis of privacy amplification by shuffling. In: 2021 IEEE 62nd annual symposium on foundations of computer science (FOCS). IEEE, pp 954–964
Geyer RC, Klein T, Nabi M (2017) Differentially private federated learning: a client level perspective. CoRR abs/1712.07557
Ghazi B, Kumar R, Manurangsi P, et al (2021) Differentially private aggregation in the shuffle model: almost central accuracy in almost a single message. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, ICML 2021, 18-24 July 2021, Virtual event, proceedings of machine learning research, vol 139. PMLR, pp 3692–3701
Ghazi B, Pagh R, Velingker A (2019) Scalable and differentially private distributed aggregation in the shuffled model. CoRR abs/1906.08320
Girgis AM, Data D, Diggavi SN, et al (2021b) Shuffled model of differential privacy in federated learning. In: The 24th International conference on artificial intelligence and statistics, AISTATS 2021, April 13-15, 2021, Virtual event, proceedings of machine learning research, vol 130. PMLR, pp 2521–2529
Girgis AM, Data D, Diggavi SN (2021a) Differentially private federated learning with shuffling and client self-sampling. In: IEEE international symposium on information theory, ISIT 2021, Melbourne, Australia, July 12-20, 2021. IEEE, pp 338–343
Girgis AM, Data D, Diggavi SN et al (2021) Shuffled model of federated learning: privacy, accuracy and communication trade-offs. IEEE J Sel Areas Inf Theory 2(1):464–478
Article Google Scholar
Li Q, Diao Y, Chen Q, et al (2022) Federated learning on non-iid data silos: an experimental study. In: 2022 IEEE 38th international conference on data engineering (ICDE), pp 965–978
Liew SP, Hasegawa S, Takahashi T (2022) Shuffled check-in: privacy amplification towards practical distributed learning. CoRR abs/2206.03151
Liu R, Cao Y, Chen H, et al (2021) FLAME: differentially private federated learning in the shuffle model. In: Thirty-Fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, The eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual event, February 2-9, 2021. AAAI Press, pp 8688–8696
Liu R, Cao Y, Yoshikawa M, et al (2020) Fedsel: federated SGD under local differential privacy with top-k dimension selection. In: Nah Y, Cui B, Lee S, et al (eds) Database systems for advanced applications - 25th International conference, DASFAA 2020, Jeju, South Korea, September 24-27, 2020, Proceedings, part I, lecture notes in computer science, vol 12112. Springer, pp 485–501
McMahan HB, Ramage D, Talwar K, et al (2018) Learning differentially private recurrent language models. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference track proceedings. OpenReview.net
McMahan B, Moore E, Ramage D, et al (2017) Communication-efficient learning of deep networks from decentralized data. In: Singh A, Zhu XJ (eds) Proceedings of the 20th international conference on artificial intelligence and statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, Proceedings of machine learning research, vol 54. PMLR, pp 1273–1282
Nasr M, Shokri R, Houmansadr A (2019) Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning. In: 2019 IEEE symposium on security and privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019. IEEE, pp 739–753
Papernot N, Abadi M, Erlingsson Ú, et al (2017) Semi-supervised knowledge transfer for deep learning from private training data. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference track proceedings
Qu Y, Gao L, Xiang Y et al (2022) Fedtwin: blockchain-enabled adaptive asynchronous federated learning for digital twin networks. IEEE Network 36(6):183–190
Article Google Scholar
Salehi B, Gu J, Roy D, et al (2022) Flash: federated learning for automated selection of high-band mmwave sectors. In: IEEE INFOCOM 2022 - IEEE conference on computer communications. IEEE Press, p 1719-1728
Scott M, Cormode G, Maple C (2022) Aggregation and transformation of vector-valued messages in the shuffle model of differential privacy. IEEE Trans Inf Forensics Secur 17:612–627
Article Google Scholar
Sun L, Qian J, Chen X (2021) Ldp-fl: Practical private aggregation in federated learning with local differential privacy. In: Zhou ZH (ed) Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI-21. International joint conferences on artificial intelligence organization, pp 1571–1578
Team ADP (2017) Learning with privacy atscale. Apple Machine Learn J 1:1
Google Scholar
Truex S, Baracaldo N, Anwar A, et al (2019) A hybrid approach to privacy-preserving federated learning. In: Proceedings of the 12th ACM workshop on artificial intelligence and security. Association for computing machinery, New York, NY, USA, AISec’19, p 1-11
Truex S, Liu L, Chow KH, et al (2020) Ldp-fed: Federated learning with local differential privacy. In: Proceedings of the third ACM international workshop on edge systems, analytics and networking. Association for computing machinery, New York, NY, USA, EdgeSys ’20, p 61-66
Wang N, Xiao X, Yang Y, et al (2019) Collecting and analyzing multidimensional data with local differential privacy. In: 35th IEEE international conference on data engineering, ICDE 2019, Macao, China, April 8-11, 2019. IEEE, pp 638–649
Wei K, Li J, Ding M et al (2020) Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans Inf Forensics Secur 15:3454–3469
Article Google Scholar
Wu G, Chen X, Gao Z et al (2024) Privacy-preserving offloading scheme in multi-access mobile edge computing based on MADRL. J Parallel Distributed Comput 183:104775
Article Google Scholar
Wu G, Chen X, Shen Y et al (2024) Combining lyapunov optimization with actor-critic networks for privacy-aware iiot computation offloading. IEEE Internet Things J 11(10):17437–17452
Article Google Scholar
Wu N, Farokhi F, Smith D, et al (2020) The value of collaboration in convex machine learning with differential privacy. In: 2020 IEEE Symposium on Security and Privacy (SP), pp 304–317
Yang Q, Liu Y, Chen T et al (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol 10(2):12:1-12:19
Article Google Scholar
Yang Q, Du X, Liu A et al (2023) Adastopk: adaptive federated shuffle model based on differential privacy. Inf Sci 642:119186
Article Google Scholar
Zhu L, Liu Z, Han S (2019) Deep leakage from gradients. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds) Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp 14747–14756

Download references

Acknowledgements

The authors would like to thank the editor and anonymous referees for their constructive comments.

Funding

This work was supported by the National Natural Science Foundation of China under grant No.62102449, the Central Plains Talent Program under grant No.224200510003, and the Key Research and Development and Promotion Program of Henan Province under grant No.222102210069.

Author information

Authors and Affiliations

School of Cyber Science and Engineering, ZhengZhou University, ZhengZhou, 450000, Henan, China
Qiantao Yang
Key Laboratory of Information Security in Henan Province, 62 Science Avenue, ZhengZhou, 450000, Henan, China
Qiantao Yang, Xuehui Du, Xiangyu Wu, Wenjuan Wang, Aodi Liu & Shihao Wang

Authors

Qiantao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xuehui Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Aodi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shihao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Qiantao Yang: Schematic design, experimentation, writing - original manuscript. Xuehui Du: Conceptualization, formal analysis, writing - review and editing, supervision. Xiangyu Wu, Aodi Liu, Wenjuan Wang, and Shihao Wang: software, Experiment, write.

Corresponding author

Correspondence to Xuehui Du.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, Q., Du, X., Wu, X. et al. Fedrtid: an efficient shuffle federated learning via random participation and adaptive time constraint. Cybersecurity 7, 76 (2024). https://doi.org/10.1186/s42400-024-00293-x

Download citation

Received: 07 June 2024
Accepted: 13 September 2024
Published: 22 December 2024
DOI: https://doi.org/10.1186/s42400-024-00293-x

Fedrtid: an efficient shuffle federated learning via random participation and adaptive time constraint

Abstract

Introduction

Related works

DP-FL

LDP-FL

SDP-FL

Preliminaries

Differential privacy

Definition 1

Definition 2

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Shuffle model

Theorem 1

Corollary 1

Theorem 2

FedRtid framework design

FedRtid framework

Randomized participation mechanisms for client users

Theorem 3

Corollary 2

Top-k sampling and adaptive shuffling constraints

Theorem 4

Corollary 3

Experiments

Experimental setup

Model performance analysis

Model privacy analysis

Model stability analysis

Model collaboration efficiency analysis

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords