Abstract
Machine Learning has been widely applied in practice, such as disease diagnosis, target detection. Commonly, a good model relies on massive training data collected from different sources. However, the collected data might expose sensitive information. To solve the problem, researchers have proposed many excellent methods that combine machine learning with privacy protection technologies, such as secure multiparty computation (MPC), homomorphic encryption (HE), and differential privacy. In the meanwhile, some other researchers proposed distributed machine learning which allows the clients to store their data locally but train a model collaboratively. The first kind of methods focuses on security, but the performance and accuracy remain to be improved, while the second provides higher accuracy and better performance but weaker security, for instance, the adversary can launch membership attacks from the gradients’ updates in plaintext.
In this paper, we join secret sharing to distributed machine learning to achieve reliable performance, accuracy, and high-level security. Next, we design, implement, and evaluate a practical system to jointly learn an accurate model under semi-honest and servers-only malicious adversary security, respectively. And the experiments show our protocols achieve the best overall performance as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
MNIST database, http://yann.lecun.com/exdb/mnist/. Accessed: 2017-09-24.
- 2.
- 3.
- 4.
We only implement the basic secure aggregation with no dropouts.
References
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979). https://doi.org/10.1145/359168.359176
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3
Du, W., Atallah, M.J.: Privacy-preserving cooperative scientific computations. In: CSFW. IEEE (2001). 0273. https://doi.org/10.1109/CSFW.2001.930152
Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 222–233 (2004). https://doi.org/10.1137/1.9781611972740.21
Sanil, A.P., Karr, A.F., Lin, X., et al.: Privacy-preserving regression modelling via distributed computation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 677–682. ACM (2004). https://doi.org/10.1145/1014052.1014139
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005). https://doi.org/10.1145/1081870.1081942
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_74
Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: “Secure” logistic regression of horizontally and vertically partitioned distributed databases. In: Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), pp. 723–728. IEEE (2007). https://doi.org/10.1109/ICDMW.2007.114
Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 486–497. ACM (2007). https://doi.org/10.1145/1315245.1315306
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Physica-Verlag HD, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Damgård, I., Pastro, V., Smart, N., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 643–662. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_38
Nikolaenko, V., Ioannidis, S., Weinsberg, U., et al.: Privacy-preserving matrix factorization. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 801–812. ACM (2013). https://doi.org/10.1145/2508859.2516751
Wu, S., Teruya, T., Kawamoto, J.: Privacy-preservation for stochastic gradient descent application to secure logistic regression. In: The 27th Annual Conference of the Japanese Society for Artificial Intelligence, vol. 27, pp. 1–4 (2013)
Song, S., Chaudhuri, K., Sarwate, A.D.: Stochastic gradient descent with differentially private updates. In: 2013 IEEE Global Conference on Signal and Information Processing, pp. 245–248. IEEE (2013). https://doi.org/10.1109/GlobalSIP.2013.6736861
Li, M., Andersen, D.G., Park, J.W., et al.: Scaling distributed machine learning with the parameter server. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 583–598 (2014). https://doi.org/10.1145/2640087.2644155
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321. ACM (2015). https://doi.org/10.1145/2810103.2813687
Abadi, M., Chu, A., Goodfellow, I., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM (2016). https://doi.org/10.1145/2976749.2978318
GascĂł, A., Schoppmann, P., Balle, B., et al.: Secure linear regression on vertically partitioned datasets. IACR Cryptology ePrint Archive 2016, 892 (2016)
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (GDPR). Official J. Eur. Union, L119 (2016)
Gilad-Bachrach, R., Dowlin, N., Laine, K., et al.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016)
Mohassel, P., Secureml, Z.Y.: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017). https://doi.org/10.1109/SP.2017.12
Liu J, Juuti, M., Lu, Y., et al.: Oblivious neural network predictions via miniONN transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631. ACM (20170. https://doi.org/10.1145/3133956.3134056
Bonawitz, K., Ivanov, V., Kreuter, B., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. ACM (2017). https://doi.org/10.1145/3133956.3133982
Lin, Y., Han, S., Mao, H., et al.: Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887 (2017)
Riazi, M.S., Weinert, C., Tkachenko, O., et al.: Chameleon: a hybrid secure computation framework for machine learning applications. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 707–721. ACM (2018). https://doi.org/10.1145/3196494.3196522
Phong, L.T., Aono, Y., Hayashi, T., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13(5), 1333–1345 (2018). https://doi.org/10.1109/TIFS.2017.2787987
Wagh, S., Gupta, D., Chandran, N.: SecureNN: 3-party secure computation for neural network training. Proc. Priv. Enhancing Technol. 1, 24 (2019). https://doi.org/10.2478/popets-2019-0035
Nasr, M., Shokri, R., Houmansadr, A.: Comprehensive privacy analysis of deep learning: stand-alone and federated learning under passive and active white-box inference attacks. arXiv preprint arXiv:1812.00910 (2018)
Yang, Q., Liu, Y., Chen, T., et al.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 12 (2019). https://doi.org/10.1145/3298981
Juvekar, C., Vaikuntanathan, V., Chandrakasan, A.: GAZELLE: a low latency framework for secure neural network inference. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 1651–1669 (2018)
Centers for Medicare & Medicaid Services. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) (1996). http://www.cms.hhs.gov/hipaa/
Acknowledgements
We are grateful to the anonymous reviewers for their comprehensive comments. And we thank Xiangfu Song, Yiran Liu from Shandong University for helpful discussions on MPC, and Junming Ke from Singapore University of Technology and Design for his help. This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDC02040400.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof of Correctness
1.1 A.1Â Lemma 1
Proof
Suppose we have two secrets, \(s_0\) and \(s_1\), and we share both in Shamir’s Secret Sharing scheme with two polynomial-functions
where \(f(0)=a_0 = s_0\), \(g(0)=b_0=s_1\) and p is a large prime.
In order to compute the shares, we can evaluate f(x) and g(x) at n different points \(f(x_0),f(x_1),...,f(x_{n-1})\) and \(g(x_0),g(x_1),...,g(x_{n-1})\) respectively.
Then we will turn to getting the shares of \(s_0 + s_1\). We define a new polynomial-function
Obviously, h(x) is a polynomial-function of degree \(t-1\) with t coefficients and \(h(0) = s_0+ s_1\). On the one hand, \(h(x_i)\) is the shares for \(s_0+s_1\), and on the other hand, we can confirm
So that the shares of \(s_0+s_1\) can be computed by adding the corresponding shares of \(s_0\) and \(s_1\).
1.2 A.2Â Lemma2
Proof
Suppose we have \(x_0,x_1,...,x_{n-1}\) and a secret key \(\alpha \). We could the compute the MAC of \(x_i\)
Then we can also compute the MAC of \(x_i\)’s sum:
Then it is easy to confirm
For a more concrete proof, please refer to [12].
BÂ Accuracy and Performance for Linear Regression and MLP
1.1 B.1Â Linear Regression
See Fig. 6.
1.2 B.2Â MLP
See Fig. 7.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dong, Y., Chen, X., Shen, L., Wang, D. (2020). Privacy-Preserving Distributed Machine Learning Based on Secret Sharing. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-41579-2_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41578-5
Online ISBN: 978-3-030-41579-2
eBook Packages: Computer ScienceComputer Science (R0)