Abstract
We show how multiple data-owning parties can collaboratively train several machine learning algorithms without jeopardizing the privacy of their sensitive data. In particular, we assume that every party knows specific features of an overlapping set of people. Using a secure implementation of an advanced hidden set intersection protocol and a privacy-preserving Gradient Descent algorithm, we are able to train a Ridge, LASSO or SVM model over the intersection of people in their data sets. Both the hidden set intersection protocol and privacy-preserving LASSO implementation are unprecedented in literature.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Data is said to be horizontally partitioned, if every partition contains all features (columns) of a non-overlapping selection of samples (rows). Alternatively, every partition of a vertically-partitioned data set contains a non-overlapping selection of features for all samples.
References
Akavia, A., Shaul, H., Weiss, M., Yakhini, Z.: Linear-regression on packed encrypted data in the two-server model. In: Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography, WAHC 2019, pp. 21–32. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338469.3358942
Blom, F., Bouman, N., Schoenmakers, B., Vreede, N.: Efficient Secure Ridge Regression from Randomized Gaussian Elimination. IACR Cryptology ePrint Archive (2019)
Bogdanov, D., Kamm, L., Laur, S., Sokk, V.: Rmind: a tool for cryptographically secure statistical analysis. IEEE Trans. Dependable Secure Comput. 15(3), 481–495 (2018)
Buddhavarapu, P., Knox, A., Mohassel, P., Sengupta, S., Taubeneck, E., Vlaskin, V.: Private matching for compute. Cryptology ePrint Archive, Report 2020/599 (2020). https://eprint.iacr.org/2020/599
Chen, H., Huang, Z., Laine, K., Rindal, P.: Labeled PSI from fully homomorphic encryption with malicious security. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, pp. 1223–1237. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3243734.3243836
Chen, Y.R., Rezapour, A., Tzeng, W.G.: Privacy-preserving ridge regression on distributed data. Inf. Sci. 451–452, 34–49 (2018). https://doi.org/10.1016/j.ins.2018.03.061. http://www.sciencedirect.com/science/article/pii/S0020025518302500
de Cock, M., Dowsley, R., Nascimento, A.C., Newman, S.C.: Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2015)
Dankar, F.K., Brien, R., Adams, C., Matwin, S.: Secure multi-party linear regression. In: EDBT/ICDT Workshops, pp. 406–414. Citeseer (2014)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. arXiv:math/0307152, November 2003
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Inc. (2002)
van Egmond, M.B., et al.: Predicting heart-failure risk using privacy-preserving dataset combination and lasso regression. Submitted to BMC Medical Informatics and Decision Making (2021)
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_1
Gascón, A., et al.: Privacy-preserving distributed linear regression on high-dimensional data. Proc. Privacy Enhancing Technol. 2017(4), 345–364 (2017)
Giacomelli, I., Jha, S., Page, C.D., Yoon, K.: Privacy-preserving ridge regression on distributed data. IACR Cryptology ePrint Archive 2017/707 (2017)
Hall, R., Fienberg, S.E., Nardi, Y.: Secure multiple linear regression based on homomorphic encryption. J. Off. Stat. 27(4), 669 (2011)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970). https://doi.org/10.1080/00401706.1970.10488634
Hu, S., Wang, Q., Wang, J., Chow, S.S.M., Zou, Q.: Securing fast learning! ridge regression over encrypted big data. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 19–26 (2016). https://doi.org/10.1109/TrustCom.2016.0041
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004). https://doi.org/10.1109/TKDE.2004.45
Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Regression on distributed databases via secure multi-party computation. In: Proceedings of the 2004 Annual National Conference on Digital Government Research, dg.o 2004, pp. 1–2. Digital Government Society of North America (2004). http://dl.acm.org/citation.cfm?id=1124191.1124299
Lin, K., Chen, M.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011). https://doi.org/10.1109/TKDE.2010.193
Lin, X., Clifton, C., Zhu, M.: Privacy-preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8(1), 68–81 (2005)
Mohassel, P., Zhang, Y.: Secureml: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38 (2017). https://doi.org/10.1109/SP.2017.12
Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE Symposium on Security and Privacy, pp. 334–348. IEEE (2013)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: Spot-light: lightweight private set intersection from sparse OT extension. Cryptology ePrint Archive (2019)
Pinkas, B., Schneider, T., Tkachenko, O., Yanai, A.: Efficient circuit-based PSI with linear communication. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11478, pp. 122–153. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17659-4_5
Que, J., Jiang, X., Ohno-Machado, L.: A collaborative framework for distributed privacy-preserving support vector machine learning. In: AMIA Annual Symposium Proceedings, vol. 2012, pp. 1350–1359. American Medical Informatics Association (2012)
Sanil, A.P., Karr, A.F., Lin, X., Reiter, J.P.: Privacy preserving regression modelling via distributed computation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 677–682. ACM, New York (2004). https://doi.org/10.1145/1014052.1014139
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017). https://doi.org/10.1007/s10107-016-1030-6
Schoenmakers, B.: MPyC - secure multiparty computation in python. https://github.com/lschoe/mpyc
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)
Sumana, M., Hareesha, K.: Modelling a secure support vector machine classifier for private data. Int. J. Inf. Comput. Secur. 10(1), 25–40 (2018)
Sun, L., Mu, W.S., Qi, B., Zhou, Z.J.: A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int. J. Mach. Learn. Cybern. 6(1), 109–118 (2015). https://doi.org/10.1007/s13042-014-0245-1
Suykens, J., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999). https://doi.org/10.1023/A:1018628609742
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)
Vaidya, J., Clifton, C.: Privacy preserving Naive Bayes classifier for vertically partitioned data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 522–526. SIAM (2004)
Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Jajodia, S., Wijesekera, D. (eds.) DBSec 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005). https://doi.org/10.1007/11535706_11
Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4), 593–622 (2005)
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7
Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–718. ACM (2004)
Yu, H., Jiang, X., Vaidya, J.: Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 603–610. ACM, New York (2006). https://doi.org/10.1145/1141277.1141415
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_74
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Veugen, T., Kamphorst, B., van de L’Isle, N., van Egmond, M.B. (2021). Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent. In: Dolev, S., Margalit, O., Pinkas, B., Schwarzmann, A. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2021. Lecture Notes in Computer Science(), vol 12716. Springer, Cham. https://doi.org/10.1007/978-3-030-78086-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-78086-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78085-2
Online ISBN: 978-3-030-78086-9
eBook Packages: Computer ScienceComputer Science (R0)