Abstract
While embracing various machine learning techniques to make effective decisions in the big data era, preserving the privacy of sensitive data poses significant challenges. In this paper, we develop a privacy-preserving distributed machine learning algorithm to address this issue. Given the assumption that each data provider owns a dataset with different sample size, our goal is to learn a common classifier over the union of all the local datasets in a distributed way without leaking any sensitive information of the data samples. Such an algorithm needs to jointly consider efficient distributed learning and effective privacy preservation. In the proposed algorithm, we extend stochastic alternating direction method of multipliers (ADMM) in a distributed setting to do distributed learning. For preserving privacy during the iterative process, we combine differential privacy and stochastic ADMM together. In particular, we propose a novel stochastic ADMM based privacy-preserving distributed machine learning (PS-ADMM) algorithm by perturbing the updating gradients, that provide differential privacy guarantee and have a low computational cost. We theoretically demonstrate the convergence rate and utility bound of our proposed PS-ADMM under strongly convex objective. Through our experiments performed on real-world datasets, we show that PS-ADMM outperforms other differentially private ADMM algorithms under the same differential privacy guarantee.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Vienna, October 2016, pp. 308–318 (2016)
Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs (1989)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Dheeru, D., Taniskidou, E.K.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, October 2015, pp. 1322–1333 (2015)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Guo, Y., Gong, Y.: Practical collaborative learning for crowdsensing in the internet of things with differential privacy. In: IEEE Conference on Communications and Network Security (CNS), Beijing, May 2018, pp. 1–9 (2018)
Han, S., Topcu, U., Pappas, G.J.: Differentially private distributed constrained optimization. IEEE Trans. Autom. Control 62(1), 50–64 (2017)
He, B., Yuan, X.: On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, Lake Tahoe, December 2013, pp. 315–323 (2013)
Liu, L., Han, Z.: Multi-block ADMM for big data optimization in smart grid. In: International Conference on Computing, Networking and Communications (ICNC), Anaheim, February 2015, pp. 556–561 (2015)
Nguyen, H., Khodaei, A., Han, Z.: A big data scale algorithm for optimal scheduling of integrated microgrids. IEEE Trans. Smart Grid 9(1), 274–282 (2016)
Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: International Conference on Machine Learning, Atlanta, June 2013, pp. 80–88 (2013)
Qin, Z., Goldfarb, D.: Structured sparsity via alternating direction methods. J. Mach. Learn. Res. 13(1), 1435–1468 (2012)
Schizas, I.D., Ribeiro, A., Giannakis, G.B.: Consensus in ad hoc wsns with noisy links - Part I: distributed estimation of deterministic signals. IEEE Trans. Sig. Process. 56(1), 350–364 (2008)
Wang, D., Ye, M., Xu, J.: Differentially private empirical risk minimization revisited: faster and more general. In: Advances in Neural Information Processing Systems, Long Beach, December 2017, pp. 2722–2731 (2017)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Zhang, T., Zhu, Q.: A dual perturbation approach for differential private ADMM-based distributed empirical risk minimization. In: Proceedings of the ACM Workshop on Artificial Intelligence and Security, Vienna, October 2016, pp. 129–137 (2016)
Zhao, S., Li, W., Zhou, Z.: Scalable stochastic alternating direction method of multipliers. arXiv preprint arXiv:1502.03529 (2015)
Acknowledgement
This work of J. Ding, and M. Pan was supported in part by the U.S. Natural Science Foundation under grants US CNS-1613661, CNS-1646607, CNS-1702850, and CNS-1801925. This work of Y. Gong was partly supported by the US National Science Foundation under grant CNS-1850523. This work of H. Zhang was partly supported by the National Natural Science Foundation of China (Grant No. 61822104, 61771044), Beijing Natural Science Foundation (No. L172025, L172049), and 111 Project (No. B170003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
The approximate gradient \(\varvec{g}_i^s\) can be written as \(\varvec{g}_i^s = \varvec{b}_i^s + \varvec{q}_i^s\), where
1.1 A.1 Proof of Lemma 1
Proof
Since each \(l_{im }(\varvec{x})\) is convex, G-Lipschitz and has \( L_m\)-Lipschitz continuous gradient, for any \(\varvec{x}_1\) and \( \varvec{x}_2,\) there exists \(L_m > 0 \) such that
We can see that \(f_i(\varvec{x})\) is \(v _{f}\)-smooth, with \(f_i(\varvec{x}_1) \le f_i(\varvec{x}_2) +(\varvec{x}_2 - \varvec{x}_1)^T\nabla f_i(\varvec{x}_1) + \frac{v_{f }}{2} \Vert \varvec{x}_2 - \varvec{x}_1\Vert ^2\), where \(v _{f} = \max _{m}L_m .\) Then, we can have
where we let \(v_{L } \ge v_{f }+ \rho \). Thus, \( L_i(\varvec{x})\) and \(\hat{L}_m(\varvec{x})\) are \(v_{L }\)-smooth. Moreover, it is obvious to see that \( L_i(\varvec{x})\) is \(\mu _L\)-strongly convex with \(\mu _L \le \mu _f+\rho \).
1.2 A.2 Basic Lemmas
Lemma 2
The variance of \(\varvec{g}_i^s\) satisfies
Proof
Notice that
Hence, the variance of \(\varvec{g}_i^s\) can be bounded as
where the first inequality uses \(\Vert a + b\Vert ^2 \le 2\Vert a\Vert ^2 + 2 \Vert b\Vert ^2\) and the second inequality uses \(\mathbb {E}\Vert \varvec{x}_i - \mathbb {E}\varvec{x}_i\Vert ^2 = \mathbb {E}\Vert \varvec{x}_i\Vert ^2 - \Vert \mathbb {E}\varvec{x}_i\Vert ^2 \le \mathbb {E}\Vert \varvec{x}_i\Vert ^2\).
Lemma 3
For \(0<\eta < \frac{1}{2 v_{L}}\), we have
Proof
Taking expectation on both sides, we obtain
Then, we have
By choosing \(\eta < 1/{(2v_{L })}\), we get
Lemma 4
Proof
We have \(\mathbb {E}(\varvec{b}_i^s) = \nabla f_i(\varvec{v}_i^s)\) and this leads to
Then, we have
According to Lemma 3, we obtain
Lemma 5
where \(\varvec{\alpha }_i^{k+1} = \varvec{\lambda }_i^k +\rho (\varvec{x}_i^{k+1}-\varvec{z}^k).\)
Proof
By deriving the optimal conditions of the minimization problem in (5), we have
Then, by using the notation \(\varvec{\alpha }_i^{{ k+1 }} = \varvec{\lambda }_i^k +\rho (\varvec{x}_i^{k+1}-\varvec{z}^k)\), we obtain
Lemma 6
where \(\varvec{\alpha }_i^{k+1} = \varvec{\lambda }_i^k +\rho (\varvec{x}_i^{k+1}-\varvec{z}^k).\)
Proof
Lemma 7
Assume \(f_i(\cdot )\) be \(\mu _f\)-strongly convex, and let \(\varvec{x}_i^{k+1}\), \(\varvec{z}^k\) and \(\varvec{\lambda }_i^k\) be generated by the proposed algorithm. For \(\eta \) satisfies , \(1-\frac{\rho \xi }{2}-\frac{\mu _f\xi }{4}+\frac{4\eta ^2 v_L^2 S}{1-2\eta v_L} \le \frac{S \eta \mu _f}{2}\), the following holds if
where \(\varvec{\alpha }_i^{k+1} = \varvec{\lambda }_i^k +\rho (\varvec{x}_i^{k+1}-\varvec{z}^k).\)
Proof
Using Lemma 4 and the strong convexity of \(L_i(\varvec{v}_i)\), we have
where \(\zeta = 2\eta - \frac{4\eta }{1-2\eta v_L}\), \(\xi =\frac{4\eta }{1-2\eta v_L} \).
Then, we obtain
where we apply Lemma 2 and \(L_i(\varvec{v}_i^s) - L_i(\varvec{x}_i) = f_i(\varvec{v}_i^s) - f_i(\varvec{x}_i) + (\varvec{v}_i^s - \varvec{x}_i)^T\varvec{q}_i^s -\frac{\rho }{2}\Vert \varvec{v}_i^s - \varvec{x}_i\Vert ^2\) to obtain the inequality. Hence, we choose \(\eta \le \frac{4\mu _L-4\rho -3\mu _f}{8v_L^2+2\mu _fv_L}\) so that \( 1-\frac{\rho \xi }{2}-\frac{\mu _f\xi }{4} \ge 1+\frac{4\eta ^2 v_L^2}{1-2\eta v_L}-\frac{\mu _L\xi }{2}-\frac{\mu _f\zeta }{4}\). We take and we know that \(f_i(\varvec{v}_i^s) - f_i(\varvec{x}_i) + (\varvec{v}_i^s - \varvec{x}_i)^T\varvec{q}_i^s\) is convex in \(\varvec{v}_i^s\). By using the Jensen’s inequality, we have
where Summing from \(s = 0,1,2,...,S-1\) and using we obtain
where \(\varvec{\alpha }_i^{{ k+1 }} = \varvec{\lambda }_i^k +\rho (\varvec{x}_i^{k+1}-\varvec{z}^k).\)
Thus, we have
where we assume \( 1-\frac{\rho \xi }{2}-\frac{\mu _f\xi }{4} +\frac{4\eta ^2 v_L^2 S}{1-2\eta v_L} \le \frac{ S\eta \mu _f }{2}.\)
1.3 A.3 Proof of Theorem 3
Proof
Combining Lemmas 7, 5 and 6 together and using the convergence criterion (9), we let \(\varvec{w}_i^{k+1} = (\varvec{x}_i^{k+1}; \varvec{z}^{k+1}; \varvec{\alpha }_i^{k+1}) \), and . For any \(\varvec{w}= (\varvec{x}_i ;\varvec{z};\varvec{\lambda }_i ),\) we have
where \(F(\varvec{w}) = \) \( \begin{pmatrix} \varvec{\alpha }_i \\ -\varvec{\alpha }_i \\ -(\varvec{x}_i - \varvec{z}) \end{pmatrix} \).
Summing the inequality over \(k = 0,1,2,...,K-1\) and using the Jensen’s inequality, we get
where , and If we take \(\varvec{x}= \varvec{x}^*,\varvec{z}= \varvec{z}^*,\) and , we have
1.4 A.4 Proof of Theorem 4
By choosing \(\eta \), which satisfies condition in Theorem 3, and \(S = O(\frac{v_f}{\mu _f})\), we can make \(A = \frac{\mu _f }{4 } \Vert \varvec{x}_i^0 - \varvec{x}_i^*\Vert ^2+ \frac{\rho }{2 } \Vert \varvec{z}^{0 } - \varvec{z}^*\Vert ^2 +\frac{1}{2\rho } (\Vert \varvec{\lambda }_i^0\Vert ^2+ \tau _i^2)\) a constant.
Then, we have
Thus, if we choose \(K = O\left( \frac{M \epsilon }{G}\sqrt{\frac{\mu _f}{v_f p \ln (1/\delta )}}\right) ,\) we have
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Ding, J., Errapotu, S.M., Zhang, H., Gong, Y., Pan, M., Han, Z. (2019). Stochastic ADMM Based Distributed Machine Learning with Differential Privacy. In: Chen, S., Choo, KK., Fu, X., Lou, W., Mohaisen, A. (eds) Security and Privacy in Communication Networks. SecureComm 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-37228-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-37228-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37227-9
Online ISBN: 978-3-030-37228-6
eBook Packages: Computer ScienceComputer Science (R0)