Privacy-Preserving Distributed Machine Learning Based on Secret Sharing

Dong, Ye; Chen, Xiaojun; Shen, Liyan; Wang, Dakui

doi:10.1007/978-3-030-41579-2_40

Ye Dong^12,13,
Xiaojun Chen¹²,
Liyan Shen^12,13 &
…
Dakui Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11999))

Included in the following conference series:

International Conference on Information and Communications Security

3048 Accesses
9 Citations

Abstract

Machine Learning has been widely applied in practice, such as disease diagnosis, target detection. Commonly, a good model relies on massive training data collected from different sources. However, the collected data might expose sensitive information. To solve the problem, researchers have proposed many excellent methods that combine machine learning with privacy protection technologies, such as secure multiparty computation (MPC), homomorphic encryption (HE), and differential privacy. In the meanwhile, some other researchers proposed distributed machine learning which allows the clients to store their data locally but train a model collaboratively. The first kind of methods focuses on security, but the performance and accuracy remain to be improved, while the second provides higher accuracy and better performance but weaker security, for instance, the adversary can launch membership attacks from the gradients’ updates in plaintext.

In this paper, we join secret sharing to distributed machine learning to achieve reliable performance, accuracy, and high-level security. Next, we design, implement, and evaluate a practical system to jointly learn an accurate model under semi-honest and servers-only malicious adversary security, respectively. And the experiments show our protocols achieve the best overall performance as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
MNIST database, http://yann.lecun.com/exdb/mnist/. Accessed: 2017-09-24.
2.
https://mortendahl.github.io/2017/04/17/private-deep-learning-with-mpc/.
3.
https://github.com/tensorflow/tensorflow/releases/tag/v1.13.1.
4.
We only implement the basic secure aggregation with no dropouts.

References

Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979). https://doi.org/10.1145/359168.359176
Article MathSciNet MATH Google Scholar
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3
Chapter Google Scholar
Du, W., Atallah, M.J.: Privacy-preserving cooperative scientific computations. In: CSFW. IEEE (2001). 0273. https://doi.org/10.1109/CSFW.2001.930152
Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 222–233 (2004). https://doi.org/10.1137/1.9781611972740.21
Sanil, A.P., Karr, A.F., Lin, X., et al.: Privacy-preserving regression modelling via distributed computation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 677–682. ACM (2004). https://doi.org/10.1145/1014052.1014139
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005). https://doi.org/10.1145/1081870.1081942
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_74
Chapter Google Scholar
Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: “Secure” logistic regression of horizontally and vertically partitioned distributed databases. In: Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), pp. 723–728. IEEE (2007). https://doi.org/10.1109/ICDMW.2007.114
Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 486–497. ACM (2007). https://doi.org/10.1145/1315245.1315306
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7
Article Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Physica-Verlag HD, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chapter Google Scholar
Damgård, I., Pastro, V., Smart, N., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 643–662. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_38
Chapter Google Scholar
Nikolaenko, V., Ioannidis, S., Weinsberg, U., et al.: Privacy-preserving matrix factorization. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 801–812. ACM (2013). https://doi.org/10.1145/2508859.2516751
Wu, S., Teruya, T., Kawamoto, J.: Privacy-preservation for stochastic gradient descent application to secure logistic regression. In: The 27th Annual Conference of the Japanese Society for Artificial Intelligence, vol. 27, pp. 1–4 (2013)
Google Scholar
Song, S., Chaudhuri, K., Sarwate, A.D.: Stochastic gradient descent with differentially private updates. In: 2013 IEEE Global Conference on Signal and Information Processing, pp. 245–248. IEEE (2013). https://doi.org/10.1109/GlobalSIP.2013.6736861
Li, M., Andersen, D.G., Park, J.W., et al.: Scaling distributed machine learning with the parameter server. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 583–598 (2014). https://doi.org/10.1145/2640087.2644155
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321. ACM (2015). https://doi.org/10.1145/2810103.2813687
Abadi, M., Chu, A., Goodfellow, I., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM (2016). https://doi.org/10.1145/2976749.2978318
Gascó, A., Schoppmann, P., Balle, B., et al.: Secure linear regression on vertically partitioned datasets. IACR Cryptology ePrint Archive 2016, 892 (2016)
Google Scholar
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (GDPR). Official J. Eur. Union, L119 (2016)
Google Scholar
Gilad-Bachrach, R., Dowlin, N., Laine, K., et al.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016)
Google Scholar
Mohassel, P., Secureml, Z.Y.: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017). https://doi.org/10.1109/SP.2017.12
Liu J, Juuti, M., Lu, Y., et al.: Oblivious neural network predictions via miniONN transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631. ACM (20170. https://doi.org/10.1145/3133956.3134056
Bonawitz, K., Ivanov, V., Kreuter, B., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. ACM (2017). https://doi.org/10.1145/3133956.3133982
Lin, Y., Han, S., Mao, H., et al.: Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887 (2017)
Riazi, M.S., Weinert, C., Tkachenko, O., et al.: Chameleon: a hybrid secure computation framework for machine learning applications. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 707–721. ACM (2018). https://doi.org/10.1145/3196494.3196522
Phong, L.T., Aono, Y., Hayashi, T., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13(5), 1333–1345 (2018). https://doi.org/10.1109/TIFS.2017.2787987
Article Google Scholar
Wagh, S., Gupta, D., Chandran, N.: SecureNN: 3-party secure computation for neural network training. Proc. Priv. Enhancing Technol. 1, 24 (2019). https://doi.org/10.2478/popets-2019-0035
Article Google Scholar
Nasr, M., Shokri, R., Houmansadr, A.: Comprehensive privacy analysis of deep learning: stand-alone and federated learning under passive and active white-box inference attacks. arXiv preprint arXiv:1812.00910 (2018)
Yang, Q., Liu, Y., Chen, T., et al.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 12 (2019). https://doi.org/10.1145/3298981
Article Google Scholar
Juvekar, C., Vaikuntanathan, V., Chandrakasan, A.: GAZELLE: a low latency framework for secure neural network inference. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 1651–1669 (2018)
Google Scholar
Centers for Medicare & Medicaid Services. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) (1996). http://www.cms.hhs.gov/hipaa/

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their comprehensive comments. And we thank Xiangfu Song, Yiran Liu from Shandong University for helpful discussions on MPC, and Junming Ke from Singapore University of Technology and Design for his help. This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDC02040400.

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Ye Dong, Xiaojun Chen, Liyan Shen & Dakui Wang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Ye Dong & Liyan Shen

Authors

Ye Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liyan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Dakui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Chen .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Jianying Zhou
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Xiapu Luo
Peking University, Beijing, China
Qingni Shen
Institute of Information Engineering, Beijing, China
Zhen Xu

Appendices

A Proof of Correctness

1.1 A.1 Lemma 1

Proof

Suppose we have two secrets, $s_0$ and $s_1$, and we share both in Shamir’s Secret Sharing scheme with two polynomial-functions

$$\begin{aligned} \begin{aligned} f(x)=a_0 + a_1\cdot x +...+a_{t-1}\cdot x^{t-1}\mod p\\ g(x)=b_0 + b_1\cdot x +...+b_{t-1}\cdot x^{t-1}\mod p \end{aligned} \end{aligned}$$

(12)

where $f(0)=a_0 = s_0$, $g(0)=b_0=s_1$ and p is a large prime.

In order to compute the shares, we can evaluate f(x) and g(x) at n different points $f(x_0),f(x_1),...,f(x_{n-1})$ and $g(x_0),g(x_1),...,g(x_{n-1})$ respectively.

Then we will turn to getting the shares of $s_0 + s_1$. We define a new polynomial-function

$$\begin{aligned} \begin{aligned} h(x) = (a_0+b_0) + (a_1+b_1)\cdot x +... +(a_{t-1}+b_{t-1})\cdot x^{t-1}\mod p \end{aligned} \end{aligned}$$

(13)

Obviously, h(x) is a polynomial-function of degree $t-1$ with t coefficients and $h(0) = s_0+ s_1$. On the one hand, $h(x_i)$ is the shares for $s_0+s_1$, and on the other hand, we can confirm

$$\begin{aligned} \begin{aligned} h(x_i)&= (a_0+b_0) + (a_1+b_1)\cdot x_i +...+ (a_{t-1}+b_{t-1})\cdot x_i^{t-1} \\ {}&=(a_0 + a_1\cdot x_i\,+...+\,a_{t-1}\cdot x_i^{t-1}) + (b_0 + b_1\cdot x_i\,+... +\,b_{t-1}\cdot x_i^{t-1})\\ {}&= f(x_i) + g(x_i) \mod p,\ 0\le i\le n-1 \end{aligned} \end{aligned}$$

(14)

So that the shares of $s_0+s_1$ can be computed by adding the corresponding shares of $s_0$ and $s_1$.

1.2 A.2 Lemma2

Proof

Suppose we have $x_0,x_1,...,x_{n-1}$ and a secret key $\alpha $. We could the compute the MAC of $x_i$

$$\begin{aligned} \begin{aligned} \varvec{\delta }(x_i) = \alpha \cdot x_i\mod p,\ 0\le i\le n-1 \end{aligned} \end{aligned}$$

(15)

Then we can also compute the MAC of $x_i$’s sum:

$$\begin{aligned} \begin{aligned} \varvec{\delta }(\sum \limits _{i=0}^{n-1} x_i) = \alpha \cdot (\sum \limits _{i=0}^{n-1} x_i)\mod p \end{aligned} \end{aligned}$$

(16)

Then it is easy to confirm

$$\begin{aligned} \begin{aligned} \varvec{\delta }(\sum \limits _{i=0}^{n-1} x_i)&= (\alpha \cdot x_0)+(\alpha \cdot x_1)+...+(\alpha \cdot x_{n-1}) \mod p\\&= \varvec{\delta }(x_0) +\varvec{\delta }(x_1) + ... + \varvec{\delta }(x_{n-1}) \mod p\\&= \sum \limits _{i=0}^{n-1} \varvec{\delta }(x_i) \mod p \end{aligned} \end{aligned}$$

(17)

For a more concrete proof, please refer to [12].

B Accuracy and Performance for Linear Regression and MLP

1.1 B.1 Linear Regression

See Fig. 6.

1.2 B.2 MLP

See Fig. 7.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, Y., Chen, X., Shen, L., Wang, D. (2020). Privacy-Preserving Distributed Machine Learning Based on Secret Sharing. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-41579-2_40
Published: 18 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41578-5
Online ISBN: 978-3-030-41579-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Privacy-Preserving Distributed Machine Learning Based on Secret Sharing

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Correctness

1.1 A.1 Lemma 1

Proof

1.2 A.2 Lemma2

Proof

B Accuracy and Performance for Linear Regression and MLP

1.1 B.1 Linear Regression

1.2 B.2 MLP

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation