Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent

Veugen, Thijs; Kamphorst, Bart; van de L’Isle, Natasja; van Egmond, Marie Beth

doi:10.1007/978-3-030-78086-9_3

Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent

Thijs Veugen^12,13,
Bart Kamphorst¹²,
Natasja van de L’Isle¹⁴ &
…
Marie Beth van Egmond¹²

Conference paper
First Online: 01 July 2021

1225 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12716))

Abstract

We show how multiple data-owning parties can collaboratively train several machine learning algorithms without jeopardizing the privacy of their sensitive data. In particular, we assume that every party knows specific features of an overlapping set of people. Using a secure implementation of an advanced hidden set intersection protocol and a privacy-preserving Gradient Descent algorithm, we are able to train a Ridge, LASSO or SVM model over the intersection of people in their data sets. Both the hidden set intersection protocol and privacy-preserving LASSO implementation are unprecedented in literature.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Data is said to be horizontally partitioned, if every partition contains all features (columns) of a non-overlapping selection of samples (rows). Alternatively, every partition of a vertically-partitioned data set contains a non-overlapping selection of features for all samples.

References

Akavia, A., Shaul, H., Weiss, M., Yakhini, Z.: Linear-regression on packed encrypted data in the two-server model. In: Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography, WAHC 2019, pp. 21–32. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338469.3358942
Blom, F., Bouman, N., Schoenmakers, B., Vreede, N.: Efficient Secure Ridge Regression from Randomized Gaussian Elimination. IACR Cryptology ePrint Archive (2019)
Google Scholar
Bogdanov, D., Kamm, L., Laur, S., Sokk, V.: Rmind: a tool for cryptographically secure statistical analysis. IEEE Trans. Dependable Secure Comput. 15(3), 481–495 (2018)
Article Google Scholar
Buddhavarapu, P., Knox, A., Mohassel, P., Sengupta, S., Taubeneck, E., Vlaskin, V.: Private matching for compute. Cryptology ePrint Archive, Report 2020/599 (2020). https://eprint.iacr.org/2020/599
Chen, H., Huang, Z., Laine, K., Rindal, P.: Labeled PSI from fully homomorphic encryption with malicious security. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, pp. 1223–1237. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3243734.3243836
Chen, Y.R., Rezapour, A., Tzeng, W.G.: Privacy-preserving ridge regression on distributed data. Inf. Sci. 451–452, 34–49 (2018). https://doi.org/10.1016/j.ins.2018.03.061. http://www.sciencedirect.com/science/article/pii/S0020025518302500
de Cock, M., Dowsley, R., Nascimento, A.C., Newman, S.C.: Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2015)
Google Scholar
Dankar, F.K., Brien, R., Adams, C., Matwin, S.: Secure multi-party linear regression. In: EDBT/ICDT Workshops, pp. 406–414. Citeseer (2014)
Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. arXiv:math/0307152, November 2003
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Inc. (2002)
Google Scholar
van Egmond, M.B., et al.: Predicting heart-failure risk using privacy-preserving dataset combination and lasso regression. Submitted to BMC Medical Informatics and Decision Making (2021)
Google Scholar
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_1
Chapter Google Scholar
Gascón, A., et al.: Privacy-preserving distributed linear regression on high-dimensional data. Proc. Privacy Enhancing Technol. 2017(4), 345–364 (2017)
Google Scholar
Giacomelli, I., Jha, S., Page, C.D., Yoon, K.: Privacy-preserving ridge regression on distributed data. IACR Cryptology ePrint Archive 2017/707 (2017)
Google Scholar
Hall, R., Fienberg, S.E., Nardi, Y.: Secure multiple linear regression based on homomorphic encryption. J. Off. Stat. 27(4), 669 (2011)
Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970). https://doi.org/10.1080/00401706.1970.10488634
Article MATH Google Scholar
Hu, S., Wang, Q., Wang, J., Chow, S.S.M., Zou, Q.: Securing fast learning! ridge regression over encrypted big data. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 19–26 (2016). https://doi.org/10.1109/TrustCom.2016.0041
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)
Google Scholar
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004). https://doi.org/10.1109/TKDE.2004.45
Article Google Scholar
Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Regression on distributed databases via secure multi-party computation. In: Proceedings of the 2004 Annual National Conference on Digital Government Research, dg.o 2004, pp. 1–2. Digital Government Society of North America (2004). http://dl.acm.org/citation.cfm?id=1124191.1124299
Lin, K., Chen, M.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011). https://doi.org/10.1109/TKDE.2010.193
Article Google Scholar
Lin, X., Clifton, C., Zhu, M.: Privacy-preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8(1), 68–81 (2005)
Article Google Scholar
Mohassel, P., Zhang, Y.: Secureml: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38 (2017). https://doi.org/10.1109/SP.2017.12
Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE Symposium on Security and Privacy, pp. 334–348. IEEE (2013)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: Spot-light: lightweight private set intersection from sparse OT extension. Cryptology ePrint Archive (2019)
Google Scholar
Pinkas, B., Schneider, T., Tkachenko, O., Yanai, A.: Efficient circuit-based PSI with linear communication. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11478, pp. 122–153. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17659-4_5
Chapter Google Scholar
Que, J., Jiang, X., Ohno-Machado, L.: A collaborative framework for distributed privacy-preserving support vector machine learning. In: AMIA Annual Symposium Proceedings, vol. 2012, pp. 1350–1359. American Medical Informatics Association (2012)
Google Scholar
Sanil, A.P., Karr, A.F., Lin, X., Reiter, J.P.: Privacy preserving regression modelling via distributed computation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 677–682. ACM, New York (2004). https://doi.org/10.1145/1014052.1014139
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017). https://doi.org/10.1007/s10107-016-1030-6
Article MathSciNet MATH Google Scholar
Schoenmakers, B.: MPyC - secure multiparty computation in python. https://github.com/lschoe/mpyc
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)
Article MathSciNet Google Scholar
Sumana, M., Hareesha, K.: Modelling a secure support vector machine classifier for private data. Int. J. Inf. Comput. Secur. 10(1), 25–40 (2018)
Google Scholar
Sun, L., Mu, W.S., Qi, B., Zhou, Z.J.: A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int. J. Mach. Learn. Cybern. 6(1), 109–118 (2015). https://doi.org/10.1007/s13042-014-0245-1
Article Google Scholar
Suykens, J., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999). https://doi.org/10.1023/A:1018628609742
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Article MathSciNet MATH Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)
Google Scholar
Vaidya, J., Clifton, C.: Privacy preserving Naive Bayes classifier for vertically partitioned data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 522–526. SIAM (2004)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Jajodia, S., Wijesekera, D. (eds.) DBSec 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005). https://doi.org/10.1007/11535706_11
Chapter MATH Google Scholar
Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4), 593–622 (2005)
Article Google Scholar
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7
Article Google Scholar
Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–718. ACM (2004)
Google Scholar
Yu, H., Jiang, X., Vaidya, J.: Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 603–610. ACM, New York (2006). https://doi.org/10.1145/1141277.1141415
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_74
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

TNO, The Hague, The Netherlands
Thijs Veugen, Bart Kamphorst & Marie Beth van Egmond
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Thijs Veugen
TMC Data Science, Eindhoven, The Netherlands
Natasja van de L’Isle

Authors

Thijs Veugen
View author publications
You can also search for this author in PubMed Google Scholar
Bart Kamphorst
View author publications
You can also search for this author in PubMed Google Scholar
Natasja van de L’Isle
View author publications
You can also search for this author in PubMed Google Scholar
Marie Beth van Egmond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thijs Veugen .

Editor information

Editors and Affiliations

Ben-Gurion University of the Negev, Be'er Sheva, Israel
Shlomi Dolev
Ben-Gurion University of the Negev, Be'er Sheva, Israel
Oded Margalit
Bar-Ilan University, Tel Aviv, Israel
Benny Pinkas
Augusta University, Augusta, GA, USA
Alexander Schwarzmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veugen, T., Kamphorst, B., van de L’Isle, N., van Egmond, M.B. (2021). Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent. In: Dolev, S., Margalit, O., Pinkas, B., Schwarzmann, A. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2021. Lecture Notes in Computer Science(), vol 12716. Springer, Cham. https://doi.org/10.1007/978-3-030-78086-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-78086-9_3
Published: 01 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78085-2
Online ISBN: 978-3-030-78086-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics