A privacy-preserving decentralized randomized block-coordinate subgradient algorithm over time-varying networks☆
Introduction
The problem of distributed optimization has received a significant amount of interest in academia due to its prevalence. Examples include problems of distributed tracking, estimation, and detection in sensor networks (Kar and Moura, 2010, Kar et al., 2012, Lesser et al., 2003, Rabbat and Nowak, 2004), distributed learning and regression problems in machine learning and control (Bekkerman et al., 2011, Belomestny et al., 2010, Cavalcante et al., 2009, Franklin, 2005), multi-agent coordination problems in multi-agent systems (Olfati-Saber, Fax, & Murray, 2007), resource allocation problems in communication networks (Beck et al., 2014, Johansson, 2008), and problems of distributed power control in smart grids (Chang, Nedić, & Scaglione, 2014). To solve these problems, we need to design efficient optimization algorithms in a distributed and cooperative fashion without any centralized coordination. Albeit iterative learning control (Jiang et al., 2022, Tao et al., 2020, Tao et al., 2021) is an effective method for optimal control, the output error information is required. Moreover, it is implemented in a centralized setting. In this paper, we consider more general optimization problems without output error information in a decentralized setting. For this reason, we focus here on the design of general decentralized optimization algorithms over networks through local communication and computation, where each agent utilizes only its own information and the information received from connected neighbors.
Distributed optimization algorithms were originally introduced by Tsitsiklis (1984) (see also Bertsekas and Tsitsiklis (1997) and Tsitsiklis, Bertsekas, and Athans (1986)), and large number of studies have been devoted to them in recent years (Duchi et al., 2012, Lee and Nedić, 2013, Nedić et al., 2008, Nedić and Ozdaglar, 2009, Nedić et al., 2010, Xi and Khan, 2017, Zhu et al., 2017, Zhu et al., 2019, Zhu et al., 2018). In these above-mentioned algorithms, the updating direction is along the negative (sub)gradient of local functions. Nevertheless, the entire (sub)gradient need to be computed at each iteration. Meanwhile, the computational complexity is proportional to the complexity of computing the corresponding local function value, which leads to a computational bottleneck when dealing with high dimensional data. Thereby, distributed (sub)gradient algorithms become prohibitively expensive for solving high dimensional optimization problems.
To alleviate the computational complexity of entire (sub)gradient vectors, coordinate descent (Bertsekas, 2016, Yang et al., 2020) is one of the greatest important methods due to its simplicity, where a block of coordinates is chosen to update at each iteration. Hence, the computational complexity of coordinate descent methods is lower than conventional (sub)gradient descent methods. The main difference among coordinate descent methods lies in the criteria used to choose the coordinates. Maximal and cyclic coordinate searches have been used often (Bertsekas, 2016). However, convergence is challenging to prove for cyclic coordinate search (Nesterov, 2012), and the convergence rate is trivial in maximal coordinate search (Bertsekas, 2016). For this reason, Nesterov (2012) studied randomized block-coordinate descent wherein the choice of coordinates was random. Moreover, the convergence rate was also established. Soon afterwards, Richtárik and Takáč (2014) extended this method to composite functions. Besides, Li et al. (2018) presented a block-coordinate algorithm to solve the continuous submodular maximization problem. Parallel coordinate descent methods have also been studied by Richtárik and Takáč (2016).
However, the above-mentioned methods are implemented in a centralized setting. To remove this limit, distributed variants of coordinate descent methods were investigated in recent years (Necoara, 2013, Notarnicola et al., 2017, Notarnicola et al., 2021, Wang et al., 2018). Specifically, Necoara (2013) developed randomized coordinate descent methods for network optimization with a smooth and convex loss function, in which the constraint set is coupled linear. Moreover, all agent need to know the loss function and select the block by using a random rule. Besides, Wang et al. (2018) studied coordinate-descent diffusion learning to solve the problem of unconstrained optimization over networks with strongly convex and smooth local loss functions. For general constrained optimization problems, Notarnicola et al., 2017, Notarnicola et al., 2021 proposed the Block-SONATA distributed algorithm, which selects one block in a cyclic rule. Nevertheless, the convergence rate is not be established explicitly. The above-mentioned algorithms assumed that the loss functions are smooth.
Despite this progress, however, the local loss functions of agents may be potentially nonsmooth in practical applications. For this reason, we focus on the case that the local loss function of each agent is smooth. Besides, the privacy leakage issue is an important problem in the training process of deep learning models. In order to address the problem, various of privacy-preserving distributed algorithms are proposed in recent years (Mao et al., 2021, Zhang and Wang, 2019, Zhu et al., 2018). In these methods, the computation of the full (sub)gradient is essential, which becomes prohibitively expensive when massive data sets are employed in training of deep neural networks. Thereby, how to design and analyze private decentralized coordinate descent algorithm for nonsmooth local loss functions remains an open problem.
For this reason, we propose a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm over time-varying networks by using local communication and computation in time-varying networks. Our proposed algorithm can tackle problems of private optimization in time-varying networks, where the data are high dimensional. Moreover, the loss function of each agent is potentially nonsmooth and is locally known. In our previous works (Xu et al., 2021, Zhu et al., 2021, Zhu et al., 2017, Zhu et al., 2019), the full subgradient is still utilized to update its decision. Nonetheless, the computation of the full subgradient is prohibitively expensive for training of deep learning models with massive data sets. Moreover, the data may be dispersed over agents of the network, which is a hindrance to the computation of the full subgradient. Although the block-coordinate method is also used in our previous work (Li et al., 2018), the loss function is submodular function and the proposed algorithm is not implementable in a decentralized way. On the other hand, the work (Zhu et al., 2018) proposed a private distributed algorithm to protect the privacy of data by using the differential privacy mechanism. Nevertheless, the privacy level unavoidably compromises the optimality of the estimated parameter. Compared with our previous works, each agent has access to only a random block of full subgradient vectors in this paper, where the goal is to reduce significantly the computational burden and the proposed algorithm can be implemented in a decentralized setting. Meanwhile, we also adopt partially homomorphic cryptography to protect the privacy of data. Moreover, the proposed algorithm does not compromise the optimal solutions.
The main contributions of the work are four-fold:
- •
We propose a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm to solve the problems of large-scale constrained optimization over time-varying networks. Moreover, only a random subset of the coordinates of the subgradient vector is chosen by each agent. Meanwhile, the partially homomorphic cryptography is employed for protecting the privacy of devices.
- •
We show that the decentralized randomized block-coordinate subgradient projection algorithm is asymptotically convergent. Meanwhile, we prove that the proposed algorithm can protect the privacy of data.
- •
We also rigorously analyze the rate of convergence. Under local strong convexity, a rate of convergence of was achieved. Under local convexity, a rate of convergence of was also achieved, in which denotes the number of iterations.
- •
We verify the performance of the proposed method. Further, we also confirm its computational benefits through simulation on two real-world datasets.
Notation: The notation denotes the transpose operation. The Euclidean norm and -norm of a vector are designated by and , respectively. The vector of ones is denoted by . Moreover, an identity matrix is denoted by . The probability of a random variable is denoted by and its expectation by . When a vector with nonnegative entries such that the sum of the entries is equal to one, is said to be a stochastic vector. A square matrix is said to be doubly stochastic if it is column stochastic and row stochastic, where each column and row of are all stochastic vectors. Besides, is a diagonal matrix.
Section snippets
Problem setup, algorithm design, and assumptions
In this section, we first propose the decentralized randomized block-coordinate subgradient projection algorithm to solve the problem of interest, and then provide some assumptions.
The local loss function of agent here is denoted by , which is potentially nonsmooth. We denote by the constraint set. Then, the problem of distributed constrained optimization can be formulated as follows: in which denotes the number of agents.
To describe a
Main results
Some main results are summarized in this section. First, we present the convergence of our algorithm with a suitable step size sequence . The result is stated as follows.
Theorem 1 Under Assumptions 1–5, the sequence , be generated by Eqs. (3) and (4). Each agent randomly chooses from , where . Moreover, assume that the step size sequence satisfies the following conditions: for all . Then, each sequence converges to with
Analysis of convergence-related performance
This section is dedicated to the analysis of convergence. For this purpose, we first present a key result for our analysis, which is stated as follows.
Lemma 1 If each agent randomly chooses from, where , then Assumption 1 holds.
Proof The detailed proof is similar to Zhang and Wang (2019), we omit it. □
Besides, let the variables , be scalar versions of , in Eqs. (2)–(3) for all . Therefore, each agent updates its estimate as follows:
Applications of decentralized optimization
In this section, we present some examples in applications, in which the problems can be solved by the proposed algorithm.
Numerical simulations
We present simulations to evaluate the performance in this section. To this end, we applied our algorithm to solve a multiclass classification problem. In our simulations, the data example was available only to the agent , and belonged to the classes , where denotes a positive constant. The local loss function of agent had the following form: where denotes a decision matrix. The constraint set of this multiclass
Conclusion
In this paper, we proposed a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm over time-varying networks, in which a random subset of the estimate of each agent is updated. We proved that the proposed algorithm is convergent with probability 1. Furthermore, we showed that the rates of convergence of and were obtained under strong convexity and convexity, respectively. Besides, we also showed that the proposed algorithm can protect
CRediT authorship contribution statement
Lin Wang: Conceptualization, Investigation, Writing – review & editing, Funding acquisition. Mingchuan Zhang: Writing – review & editing, Methodology, Software, Funding acquisition. Junlong Zhu: Formal analysis, Visualization, Funding acquisition. Ling Xing: Supervision, Validation. Qingtao Wu: Resources, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (49)
- et al.
Value iteration and adaptive optimal output regulation with assured convergence rate
Control Engineering Practice
(2022) - et al.
Stochastic block-coordinate gradient projection algorithms for submodular maximization
Complexity
(2018) - et al.
An o(1/k) gradient method for network resource allocation problems
IEEE Transactions on Control of Network Systems
(2014) - et al.
- et al.
Regression methods for stochastic control problems and their convergence analysis
SIAM Journal on Control and Optimization
(2010) Nonlinear programming: 3rd ed
(2016)- et al.
Parallel and distributed computation: Numerical methods
(1997) Probability and measure
(2012)- et al.
Self-organization in biological systems
(2003) - et al.
An adaptive projected subgradient approach to learning in diffusion networks
IEEE Transactions on Signal Processing
(2009)
Distributed constrained optimization by consensus-based primal-dual perturbation method
IEEE Transactions on Automatic Control
Dual averaging for distributed optimization: Convergence analysis and network scaling
IEEE Transactions on Automatic Control
The elements of statistical learning: Data mining, inference and prediction
The Mathematical Intelligencer
Coordination of groups of mobile autonomous agents using nearest neighbor rules
IEEE Transactions on Automatic Control
On distributed optimization in networked systems
Distributed consensus algorithms in sensor networks: Quantized data and random link failures
IEEE Transactions on Signal Processing
Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication
IEEE Transactions on Information Theory
Distributed random projection algorithm for convex optimization
IEEE Journal of Selected Topics in Signal Processing
Distributed sensor networks: A multiagent perspective
A privacy preserving distributed optimization algorithm for economic dispatch over time-varying directed networks
IEEE Transactions on Industrial Informatics
Random coordinate descent algorithms for multi-agent convex optimization over networks
IEEE Transactions on Automatic Control
On stochastic subgradient mirror-descent algorithm with weighted averaging
SIAM Journal on Optimization
Stochastic gradient-push for strongly convex functions on time-varying directed graphs
IEEE Transactions on Automatic Control
Cited by (1)
- ☆
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants No. 62002102, No. 61971458, and No. 61976243, and in part by the Leading talents of science and technology in the Central Plain of China under Grants No. 224200510004, and in part by the Scientific and Technological Innovation Talents of Colleges and Universities in Henan Province, China under Grants No. 22HASTIT014.