Skip to main content

Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12716))

Abstract

We show how multiple data-owning parties can collaboratively train several machine learning algorithms without jeopardizing the privacy of their sensitive data. In particular, we assume that every party knows specific features of an overlapping set of people. Using a secure implementation of an advanced hidden set intersection protocol and a privacy-preserving Gradient Descent algorithm, we are able to train a Ridge, LASSO or SVM model over the intersection of people in their data sets. Both the hidden set intersection protocol and privacy-preserving LASSO implementation are unprecedented in literature.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Data is said to be horizontally partitioned, if every partition contains all features (columns) of a non-overlapping selection of samples (rows). Alternatively, every partition of a vertically-partitioned data set contains a non-overlapping selection of features for all samples.

References

  1. Akavia, A., Shaul, H., Weiss, M., Yakhini, Z.: Linear-regression on packed encrypted data in the two-server model. In: Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography, WAHC 2019, pp. 21–32. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338469.3358942

  2. Blom, F., Bouman, N., Schoenmakers, B., Vreede, N.: Efficient Secure Ridge Regression from Randomized Gaussian Elimination. IACR Cryptology ePrint Archive (2019)

    Google Scholar 

  3. Bogdanov, D., Kamm, L., Laur, S., Sokk, V.: Rmind: a tool for cryptographically secure statistical analysis. IEEE Trans. Dependable Secure Comput. 15(3), 481–495 (2018)

    Article  Google Scholar 

  4. Buddhavarapu, P., Knox, A., Mohassel, P., Sengupta, S., Taubeneck, E., Vlaskin, V.: Private matching for compute. Cryptology ePrint Archive, Report 2020/599 (2020). https://eprint.iacr.org/2020/599

  5. Chen, H., Huang, Z., Laine, K., Rindal, P.: Labeled PSI from fully homomorphic encryption with malicious security. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, pp. 1223–1237. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3243734.3243836

  6. Chen, Y.R., Rezapour, A., Tzeng, W.G.: Privacy-preserving ridge regression on distributed data. Inf. Sci. 451–452, 34–49 (2018). https://doi.org/10.1016/j.ins.2018.03.061. http://www.sciencedirect.com/science/article/pii/S0020025518302500

  7. de Cock, M., Dowsley, R., Nascimento, A.C., Newman, S.C.: Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2015)

    Google Scholar 

  8. Dankar, F.K., Brien, R., Adams, C., Matwin, S.: Secure multi-party linear regression. In: EDBT/ICDT Workshops, pp. 406–414. Citeseer (2014)

    Google Scholar 

  9. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. arXiv:math/0307152, November 2003

  10. Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Inc. (2002)

    Google Scholar 

  11. van Egmond, M.B., et al.: Predicting heart-failure risk using privacy-preserving dataset combination and lasso regression. Submitted to BMC Medical Informatics and Decision Making (2021)

    Google Scholar 

  12. Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_1

    Chapter  Google Scholar 

  13. Gascón, A., et al.: Privacy-preserving distributed linear regression on high-dimensional data. Proc. Privacy Enhancing Technol. 2017(4), 345–364 (2017)

    Google Scholar 

  14. Giacomelli, I., Jha, S., Page, C.D., Yoon, K.: Privacy-preserving ridge regression on distributed data. IACR Cryptology ePrint Archive 2017/707 (2017)

    Google Scholar 

  15. Hall, R., Fienberg, S.E., Nardi, Y.: Secure multiple linear regression based on homomorphic encryption. J. Off. Stat. 27(4), 669 (2011)

    Google Scholar 

  16. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970). https://doi.org/10.1080/00401706.1970.10488634

    Article  MATH  Google Scholar 

  17. Hu, S., Wang, Q., Wang, J., Chow, S.S.M., Zou, Q.: Securing fast learning! ridge regression over encrypted big data. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 19–26 (2016). https://doi.org/10.1109/TrustCom.2016.0041

  18. Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)

    Google Scholar 

  19. Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004). https://doi.org/10.1109/TKDE.2004.45

    Article  Google Scholar 

  20. Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Regression on distributed databases via secure multi-party computation. In: Proceedings of the 2004 Annual National Conference on Digital Government Research, dg.o 2004, pp. 1–2. Digital Government Society of North America (2004). http://dl.acm.org/citation.cfm?id=1124191.1124299

  21. Lin, K., Chen, M.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011). https://doi.org/10.1109/TKDE.2010.193

    Article  Google Scholar 

  22. Lin, X., Clifton, C., Zhu, M.: Privacy-preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8(1), 68–81 (2005)

    Article  Google Scholar 

  23. Mohassel, P., Zhang, Y.: Secureml: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38 (2017). https://doi.org/10.1109/SP.2017.12

  24. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE Symposium on Security and Privacy, pp. 334–348. IEEE (2013)

    Google Scholar 

  25. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: Spot-light: lightweight private set intersection from sparse OT extension. Cryptology ePrint Archive (2019)

    Google Scholar 

  27. Pinkas, B., Schneider, T., Tkachenko, O., Yanai, A.: Efficient circuit-based PSI with linear communication. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11478, pp. 122–153. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17659-4_5

    Chapter  Google Scholar 

  28. Que, J., Jiang, X., Ohno-Machado, L.: A collaborative framework for distributed privacy-preserving support vector machine learning. In: AMIA Annual Symposium Proceedings, vol. 2012, pp. 1350–1359. American Medical Informatics Association (2012)

    Google Scholar 

  29. Sanil, A.P., Karr, A.F., Lin, X., Reiter, J.P.: Privacy preserving regression modelling via distributed computation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 677–682. ACM, New York (2004). https://doi.org/10.1145/1014052.1014139

  30. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017). https://doi.org/10.1007/s10107-016-1030-6

    Article  MathSciNet  MATH  Google Scholar 

  31. Schoenmakers, B.: MPyC - secure multiparty computation in python. https://github.com/lschoe/mpyc

  32. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)

    Article  MathSciNet  Google Scholar 

  33. Sumana, M., Hareesha, K.: Modelling a secure support vector machine classifier for private data. Int. J. Inf. Comput. Secur. 10(1), 25–40 (2018)

    Google Scholar 

  34. Sun, L., Mu, W.S., Qi, B., Zhou, Z.J.: A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int. J. Mach. Learn. Cybern. 6(1), 109–118 (2015). https://doi.org/10.1007/s13042-014-0245-1

    Article  Google Scholar 

  35. Suykens, J., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999). https://doi.org/10.1023/A:1018628609742

    Article  Google Scholar 

  36. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

    Article  MathSciNet  MATH  Google Scholar 

  37. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)

    Google Scholar 

  38. Vaidya, J., Clifton, C.: Privacy preserving Naive Bayes classifier for vertically partitioned data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 522–526. SIAM (2004)

    Google Scholar 

  39. Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Jajodia, S., Wijesekera, D. (eds.) DBSec 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005). https://doi.org/10.1007/11535706_11

    Chapter  MATH  Google Scholar 

  40. Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4), 593–622 (2005)

    Article  Google Scholar 

  41. Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7

    Article  Google Scholar 

  42. Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–718. ACM (2004)

    Google Scholar 

  43. Yu, H., Jiang, X., Vaidya, J.: Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 603–610. ACM, New York (2006). https://doi.org/10.1145/1141277.1141415

  44. Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_74

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thijs Veugen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Veugen, T., Kamphorst, B., van de L’Isle, N., van Egmond, M.B. (2021). Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent. In: Dolev, S., Margalit, O., Pinkas, B., Schwarzmann, A. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2021. Lecture Notes in Computer Science(), vol 12716. Springer, Cham. https://doi.org/10.1007/978-3-030-78086-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78086-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78085-2

  • Online ISBN: 978-3-030-78086-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics