A privacy-preserving decentralized randomized block-coordinate subgradient algorithm over time-varying networks

https://doi.org/10.1016/j.eswa.2022.118099Get rights and content

Highlights

  • This paper proposes a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm.

  • It proves that the proposed algorithm is asymptotically convergent.

  • It proves that the proposed algorithm can protect the privacy of data.

  • It proves that the rate of convergence is achieved, i.e., O(logK/K) and O(logK/K).

Abstract

This study considers a constrained huge-scale optimization problem over networks where the objective is to minimize the sum of nonsmooth local loss functions. To solve this problem, many optimization algorithms have been proposed by employing (sub)gradient descent methods to handle high-dimensional data, but the computation of the entire sub(gradient) becomes a computational bottleneck. To reduce the computational burden of each agent and preserve privacy of data in time-varying networks, we propose a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm over time-varying networks, in which the coordinates of the subgradient vector is randomly chosen to update the optimized parameter and the partially homomorphic cryptography is used to protect the privacy of data. Further, we prove that our algorithm is convergent asymptotically. Moreover, the rates of convergence are also established by choosing appropriate step sizes, i.e., O(logK/K) under local strong convexity and O(logK/K) under local convexity, in which K represents the number of iterations. Meanwhile, we show that the privacy of data can be protected by the proposed algorithm. The results of experiments demonstrate the computational benefit of our algorithm on two real-world datasets. The theoretical results are also verified by different experiments.

Introduction

The problem of distributed optimization has received a significant amount of interest in academia due to its prevalence. Examples include problems of distributed tracking, estimation, and detection in sensor networks (Kar and Moura, 2010, Kar et al., 2012, Lesser et al., 2003, Rabbat and Nowak, 2004), distributed learning and regression problems in machine learning and control (Bekkerman et al., 2011, Belomestny et al., 2010, Cavalcante et al., 2009, Franklin, 2005), multi-agent coordination problems in multi-agent systems (Olfati-Saber, Fax, & Murray, 2007), resource allocation problems in communication networks (Beck et al., 2014, Johansson, 2008), and problems of distributed power control in smart grids (Chang, Nedić, & Scaglione, 2014). To solve these problems, we need to design efficient optimization algorithms in a distributed and cooperative fashion without any centralized coordination. Albeit iterative learning control (Jiang et al., 2022, Tao et al., 2020, Tao et al., 2021) is an effective method for optimal control, the output error information is required. Moreover, it is implemented in a centralized setting. In this paper, we consider more general optimization problems without output error information in a decentralized setting. For this reason, we focus here on the design of general decentralized optimization algorithms over networks through local communication and computation, where each agent utilizes only its own information and the information received from connected neighbors.

Distributed optimization algorithms were originally introduced by Tsitsiklis (1984) (see also Bertsekas and Tsitsiklis (1997) and Tsitsiklis, Bertsekas, and Athans (1986)), and large number of studies have been devoted to them in recent years (Duchi et al., 2012, Lee and Nedić, 2013, Nedić et al., 2008, Nedić and Ozdaglar, 2009, Nedić et al., 2010, Xi and Khan, 2017, Zhu et al., 2017, Zhu et al., 2019, Zhu et al., 2018). In these above-mentioned algorithms, the updating direction is along the negative (sub)gradient of local functions. Nevertheless, the entire (sub)gradient need to be computed at each iteration. Meanwhile, the computational complexity is proportional to the complexity of computing the corresponding local function value, which leads to a computational bottleneck when dealing with high dimensional data. Thereby, distributed (sub)gradient algorithms become prohibitively expensive for solving high dimensional optimization problems.

To alleviate the computational complexity of entire (sub)gradient vectors, coordinate descent (Bertsekas, 2016, Yang et al., 2020) is one of the greatest important methods due to its simplicity, where a block of coordinates is chosen to update at each iteration. Hence, the computational complexity of coordinate descent methods is lower than conventional (sub)gradient descent methods. The main difference among coordinate descent methods lies in the criteria used to choose the coordinates. Maximal and cyclic coordinate searches have been used often (Bertsekas, 2016). However, convergence is challenging to prove for cyclic coordinate search (Nesterov, 2012), and the convergence rate is trivial in maximal coordinate search (Bertsekas, 2016). For this reason, Nesterov (2012) studied randomized block-coordinate descent wherein the choice of coordinates was random. Moreover, the convergence rate was also established. Soon afterwards, Richtárik and Takáč (2014) extended this method to composite functions. Besides, Li et al. (2018) presented a block-coordinate algorithm to solve the continuous submodular maximization problem. Parallel coordinate descent methods have also been studied by Richtárik and Takáč (2016).

However, the above-mentioned methods are implemented in a centralized setting. To remove this limit, distributed variants of coordinate descent methods were investigated in recent years (Necoara, 2013, Notarnicola et al., 2017, Notarnicola et al., 2021, Wang et al., 2018). Specifically, Necoara (2013) developed randomized coordinate descent methods for network optimization with a smooth and convex loss function, in which the constraint set is coupled linear. Moreover, all agent need to know the loss function and select the block by using a random rule. Besides, Wang et al. (2018) studied coordinate-descent diffusion learning to solve the problem of unconstrained optimization over networks with strongly convex and smooth local loss functions. For general constrained optimization problems, Notarnicola et al., 2017, Notarnicola et al., 2021 proposed the Block-SONATA distributed algorithm, which selects one block in a cyclic rule. Nevertheless, the convergence rate is not be established explicitly. The above-mentioned algorithms assumed that the loss functions are smooth.

Despite this progress, however, the local loss functions of agents may be potentially nonsmooth in practical applications. For this reason, we focus on the case that the local loss function of each agent is smooth. Besides, the privacy leakage issue is an important problem in the training process of deep learning models. In order to address the problem, various of privacy-preserving distributed algorithms are proposed in recent years (Mao et al., 2021, Zhang and Wang, 2019, Zhu et al., 2018). In these methods, the computation of the full (sub)gradient is essential, which becomes prohibitively expensive when massive data sets are employed in training of deep neural networks. Thereby, how to design and analyze private decentralized coordinate descent algorithm for nonsmooth local loss functions remains an open problem.

For this reason, we propose a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm over time-varying networks by using local communication and computation in time-varying networks. Our proposed algorithm can tackle problems of private optimization in time-varying networks, where the data are high dimensional. Moreover, the loss function of each agent is potentially nonsmooth and is locally known. In our previous works (Xu et al., 2021, Zhu et al., 2021, Zhu et al., 2017, Zhu et al., 2019), the full subgradient is still utilized to update its decision. Nonetheless, the computation of the full subgradient is prohibitively expensive for training of deep learning models with massive data sets. Moreover, the data may be dispersed over agents of the network, which is a hindrance to the computation of the full subgradient. Although the block-coordinate method is also used in our previous work (Li et al., 2018), the loss function is submodular function and the proposed algorithm is not implementable in a decentralized way. On the other hand, the work (Zhu et al., 2018) proposed a private distributed algorithm to protect the privacy of data by using the differential privacy mechanism. Nevertheless, the privacy level unavoidably compromises the optimality of the estimated parameter. Compared with our previous works, each agent has access to only a random block of full subgradient vectors in this paper, where the goal is to reduce significantly the computational burden and the proposed algorithm can be implemented in a decentralized setting. Meanwhile, we also adopt partially homomorphic cryptography to protect the privacy of data. Moreover, the proposed algorithm does not compromise the optimal solutions.

The main contributions of the work are four-fold:

  • We propose a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm to solve the problems of large-scale constrained optimization over time-varying networks. Moreover, only a random subset of the coordinates of the subgradient vector is chosen by each agent. Meanwhile, the partially homomorphic cryptography is employed for protecting the privacy of devices.

  • We show that the decentralized randomized block-coordinate subgradient projection algorithm is asymptotically convergent. Meanwhile, we prove that the proposed algorithm can protect the privacy of data.

  • We also rigorously analyze the rate of convergence. Under local strong convexity, a rate of convergence of O(logK/K) was achieved. Under local convexity, a rate of convergence of O(logK/K) was also achieved, in which K denotes the number of iterations.

  • We verify the performance of the proposed method. Further, we also confirm its computational benefits through simulation on two real-world datasets.

Notation: The notation () denotes the transpose operation. The Euclidean norm and 1-norm of a vector x are designated by x and x1, respectively. The vector of ones is denoted by 1. Moreover, an n×n identity matrix is denoted by In. The probability of a random variable is denoted by P and its expectation by E. When a vector xRd with nonnegative entries such that the sum of the entries is equal to one, x is said to be a stochastic vector. A square n×n matrix A is said to be doubly stochastic if it is column stochastic and row stochastic, where each column and row of A are all stochastic vectors. Besides, diag() is a diagonal matrix.

Section snippets

Problem setup, algorithm design, and assumptions

In this section, we first propose the decentralized randomized block-coordinate subgradient projection algorithm to solve the problem of interest, and then provide some assumptions.

The local loss function of agent i here is denoted by ψi:RdR, which is potentially nonsmooth. We denote KRd by the constraint set. Then, the problem of distributed constrained optimization can be formulated as follows: minimizeΨw=i=1mψiw,subject towK,in which m denotes the number of agents.

To describe a

Main results

Some main results are summarized in this section. First, we present the convergence of our algorithm with a suitable step size sequence ηk. The result is stated as follows.

Theorem 1

Under Assumptions 15, the sequence {wik}, iV be generated by Eqs. (3) and (4). Each agent randomly chooses βij(k) from α,(1α)/(m1), where α(0,1/m). Moreover, assume that the step size sequence ηk satisfies the following conditions: k=1ηk=,k=1η2k<,ηkηsfor all k>s1. Then, each sequence {wik} converges to w with

Analysis of convergence-related performance

This section is dedicated to the analysis of convergence. For this purpose, we first present a key result for our analysis, which is stated as follows.

Lemma 1

If each agent randomly chooses βij(k) fromα,(1α)/(m1), where α(0,1/m), then Assumption 1 holds.

Proof

The detailed proof is similar to Zhang and Wang (2019), we omit it. 

Besides, let the variables zik, wik be scalar versions of zik, wik in Eqs. (2)–(3) for all iV. Therefore, each agent i updates its estimate wik as follows: zik=jNikaijkwjk,yi

Applications of decentralized optimization

In this section, we present some examples in applications, in which the problems can be solved by the proposed algorithm.

Numerical simulations

We present simulations to evaluate the performance in this section. To this end, we applied our algorithm to solve a multiclass classification problem. In our simulations, the data example eikRd was available only to the agent i, and belonged to the classes C=1,,c, where c denotes a positive constant. The local loss function of agent i had the following form: ψiWik=log1+yikexp(weikwyikeik)where Wik=[w1;;wc]Rc×d denotes a decision matrix. The constraint set of this multiclass

Conclusion

In this paper, we proposed a privacy-preserving decentralized randomized block-coordinate subgradient projection algorithm over time-varying networks, in which a random subset of the estimate of each agent is updated. We proved that the proposed algorithm is convergent with probability 1. Furthermore, we showed that the rates of convergence of O(logK/K) and O(logK/K) were obtained under strong convexity and convexity, respectively. Besides, we also showed that the proposed algorithm can protect

CRediT authorship contribution statement

Lin Wang: Conceptualization, Investigation, Writing – review & editing, Funding acquisition. Mingchuan Zhang: Writing – review & editing, Methodology, Software, Funding acquisition. Junlong Zhu: Formal analysis, Visualization, Funding acquisition. Ling Xing: Supervision, Validation. Qingtao Wu: Resources, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (49)

  • JiangY. et al.

    Value iteration and adaptive optimal output regulation with assured convergence rate

    Control Engineering Practice

    (2022)
  • LiZ. et al.

    Stochastic block-coordinate gradient projection algorithms for submodular maximization

    Complexity

    (2018)
  • BeckA. et al.

    An o(1/k) gradient method for network resource allocation problems

    IEEE Transactions on Control of Network Systems

    (2014)
  • BekkermanR. et al.
  • BelomestnyD. et al.

    Regression methods for stochastic control problems and their convergence analysis

    SIAM Journal on Control and Optimization

    (2010)
  • BertsekasD.P.

    Nonlinear programming: 3rd ed

    (2016)
  • BertsekasD.P. et al.

    Parallel and distributed computation: Numerical methods

    (1997)
  • BillingsleyP.

    Probability and measure

    (2012)
  • CamazineS. et al.

    Self-organization in biological systems

    (2003)
  • CavalcanteR.L.G. et al.

    An adaptive projected subgradient approach to learning in diffusion networks

    IEEE Transactions on Signal Processing

    (2009)
  • ChangT.-H. et al.

    Distributed constrained optimization by consensus-based primal-dual perturbation method

    IEEE Transactions on Automatic Control

    (2014)
  • DuchiJ.C. et al.

    Dual averaging for distributed optimization: Convergence analysis and network scaling

    IEEE Transactions on Automatic Control

    (2012)
  • FranklinJ.

    The elements of statistical learning: Data mining, inference and prediction

    The Mathematical Intelligencer

    (2005)
  • JadbabaieA. et al.

    Coordination of groups of mobile autonomous agents using nearest neighbor rules

    IEEE Transactions on Automatic Control

    (2003)
  • JohanssonB.

    On distributed optimization in networked systems

    (2008)
  • KarS. et al.

    Distributed consensus algorithms in sensor networks: Quantized data and random link failures

    IEEE Transactions on Signal Processing

    (2010)
  • KarS. et al.

    Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication

    IEEE Transactions on Information Theory

    (2012)
  • LeeS. et al.

    Distributed random projection algorithm for convex optimization

    IEEE Journal of Selected Topics in Signal Processing

    (2013)
  • LesserV. et al.

    Distributed sensor networks: A multiagent perspective

    (2003)
  • MaoS. et al.

    A privacy preserving distributed optimization algorithm for economic dispatch over time-varying directed networks

    IEEE Transactions on Industrial Informatics

    (2021)
  • NecoaraI.

    Random coordinate descent algorithms for multi-agent convex optimization over networks

    IEEE Transactions on Automatic Control

    (2013)
  • NedićA. et al.

    On stochastic subgradient mirror-descent algorithm with weighted averaging

    SIAM Journal on Optimization

    (2014)
  • NedićA. et al.

    Stochastic gradient-push for strongly convex functions on time-varying directed graphs

    IEEE Transactions on Automatic Control

    (2016)
  • Nedić, A., Olshevsky, A., Ozdaglar, A., & Tsitsiklis, J. N. (2008). Distributed subgradient methods and quantization...
  • Cited by (1)

    This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants No. 62002102, No. 61971458, and No. 61976243, and in part by the Leading talents of science and technology in the Central Plain of China under Grants No. 224200510004, and in part by the Scientific and Technological Innovation Talents of Colleges and Universities in Henan Province, China under Grants No. 22HASTIT014.

    View full text