Skip to main content

Privacy-Preserving Decision Trees Training and Prediction

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Abstract

In the era of cloud computing and machine learning, data has become a highly valuable resource. Recent history has shown that the benefits brought forth by this data driven culture come at a cost of potential data leakage. Such breaches have a devastating impact on individuals and industry, and lead the community to seek privacy preserving solutions. A promising approach is to utilize Fully Homomorphic Encryption (\(\mathsf {FHE }\)) to enable machine learning over encrypted data, thus providing resiliency against information leakage. However, computing over encrypted data incurs a high computational overhead, thus requiring the redesign of algorithms, in an “\(\mathsf {FHE }\)-friendly” manner, to maintain their practicality.

In this work we focus on the ever-popular tree based methods (e.g., boosting, random forests), and propose a new privacy-preserving solution to training and prediction for trees. Our solution employs a low-degree approximation for the step-function together with a lightweight interactive protocol, to replace components of the vanilla algorithm that are costly over encrypted data. Our protocols for decision trees achieve practical usability demonstrated on standard UCI datasets, encrypted with fully homomorphic encryption. In addition, the communication complexity of our protocols is independent of the tree size and dataset size in prediction and training, respectively, which significantly improves on prior works.

The first author thanks the Israel Science Foundation (grant 3380/19) and Israel National Cyber Directorate via the Haifa, BIU and Tel-Aviv cyber centers for their support. The authors wish to thank Yaron Sheffer for helpful discussions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We remark that in [1] we prove privacy against malicious adversaries (stronger) and require circuit-private \(\mathsf {FHE }\). Here the adversary is semi-honest and our proof does not require circuit-privacy; our proof is the same as the proof of Theorem 2 in [1] except for encrypting random elements in \(\mathcal {M}\) rather than executing \(\mathsf {Eval}\).

References

  1. Akavia, A., Leibovich, M., Resheff, Y.S., Ron, R., Shahar, M., Vald, M.: Privacy-preserving decision tree training and prediction against malicious server. Cryptology ePrint Archive, Report 2019/1282 (2019)

    Google Scholar 

  2. Barni, M., Failla, P., Kolesnikov, V., Lazzeretti, R., Sadeghi, A.-R., Schneider, T.: Secure evaluation of private linear branching programs with medical applications. In: Backes, M., Ning, P. (eds.) ESORICS 2009. LNCS, vol. 5789, pp. 424–439. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04444-1_26

    Chapter  Google Scholar 

  3. Blatt, M., Gusev, A., Polyakov, Y., Rohloff, K., Vaikuntanathan, V.: Optimized homomorphic encryption solution for secure genome-wide association studies. Cryptology ePrint Archive, Report 2019/223 (2019). https://eprint.iacr.org/2019/223

  4. Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. In: NDSS, vol. 4324, p. 4325 (2015)

    Google Scholar 

  5. Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_50

    Chapter  Google Scholar 

  6. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully homomorphic encryption without bootstrapping. In: Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, 8–10 January 2012, pp. 309–325 (2012)

    Google Scholar 

  7. Brickell, J., Porter, D.E., Shmatikov, V., Witchel, E.: Privacy-preserving remote diagnostics. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 498–507. ACM (2007)

    Google Scholar 

  8. Chen, H., et al.: Logistic regression over encrypted data from fully homomorphic encryption. BMC Med. Genomics 11(4), 81 (2018)

    Article  Google Scholar 

  9. Chen, H., et al.: Logistic regression over encrypted data from fully homomorphic encryption. BMC Med. Genomics 11, 81 (2018). https://doi.org/10.1186/s12920-018-0397-z

    Article  Google Scholar 

  10. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15

    Chapter  Google Scholar 

  11. De Cock, M., et al.: Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Trans. Dependable Secure Comput. 16(2), 217–230 (2017)

    Article  Google Scholar 

  12. Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Inc. (2002)

    Google Scholar 

  13. Dua, D., Graff, C.: UCI machine learning repository (2017)

    Google Scholar 

  14. Emekci, F., Sahin, O.D., Agrawal, D., El Abbadi, A.: Privacy preserving decision tree learning over multiple parties. Data Knowl. Eng. 63(2), 348–361 (2007)

    Article  Google Scholar 

  15. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012)

    Google Scholar 

  16. Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis. Stanford University (2009). https://crypto.stanford.edu/craig/

  17. Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016)

    Google Scholar 

  18. Halevi, S.: Homomorphic encryption. Tutorials on the Foundations of Cryptography. ISC, pp. 219–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57048-8_5

    Chapter  Google Scholar 

  19. Hazay, C., Lindell, Y.: Efficient Secure Two-Party Protocols. ISC. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14303-8

    Book  MATH  Google Scholar 

  20. Hesamifard, E., Takabi, H., Ghasemi, M., Wright, R.: Privacy-preserving machine learning as a service. In: Proceedings on Privacy Enhancing Technologies 2018, pp. 123–142 (06 2018)

    Google Scholar 

  21. de Hoogh, S., Schoenmakers, B., Chen, P., op den Akker, H.: Practical secure decision tree learning in a teletreatment application. In: Christin, N., Safavi-Naini, R. (eds.) FC 2014. LNCS, vol. 8437, pp. 179–194. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45472-5_12

    Chapter  Google Scholar 

  22. Joye, M., Salehi, F.: Private yet efficient decision tree evaluation. In: Kerschbaum, F., Paraboschi, S. (eds.) DBSec 2018. LNCS, vol. 10980, pp. 243–259. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95729-6_16

    Chapter  Google Scholar 

  23. Katz, J., Lindell, Y.: Introduction to Modern Cryptography. Chapman & Hall/CRC Cryptography and Network Security Series. Chapman & Hall/CRC, Boca Raton (2007)

    Book  Google Scholar 

  24. Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.: Logistic regression model training based on the approximate homomorphic encryption. BMC Med. Genomics 11, 23–31 (2018)

    Article  Google Scholar 

  25. Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.H.: Logistic regression model training based on the approximate homomorphic encryption. BMC Med. Genomics 11(4), 83 (2018)

    Article  Google Scholar 

  26. Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption: design and evaluation. JMIR Med. Inf. 6, e19 (2017)

    Article  Google Scholar 

  27. Kiss, Á., Naderpour, M., Liu, J., Asokan, N., Schneider, T.: SoK: modular and efficient private decision tree evaluation. PoPETs 2019(2), 187–208 (2019)

    Google Scholar 

  28. Kyoohyung, H., Hong, S., Cheon, J., Park, D.: Logistic regression on homomorphic encrypted data at scale. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9466–9471, July 2019

    Google Scholar 

  29. Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3

    Chapter  Google Scholar 

  30. Lory, P.: Enhancing the efficiency in privacy preserving learning of decision trees in partitioned databases. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 322–335. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33627-0_25

    Chapter  Google Scholar 

  31. Nandakumar, K., Ratha, N.K., Pankanti, S., Halevi, S.: Towards deep neural network training on encrypted data. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, 16–20 June 2019. p. 0. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  32. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  33. Remez, E.Y.: Sur la détermination des polynômes d’approximation de degré donnée. Comm. Soc. Math. Kharkov 10(4163), 196 (1934)

    Google Scholar 

  34. Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Sec. Comput. 4, 169–179 (1978)

    MathSciNet  Google Scholar 

  35. Rivlin, T.J.: An Introduction to the Approximation of Functions. Courier Corporation, North Chelmsford (2003)

    Google Scholar 

  36. Samet, S., Miri, A.: Privacy preserving ID3 using Gini index over horizontally partitioned data. In: Proceedings of the 2008 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008, pp. 645–651. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  37. Microsoft SEAL (release 3.3). Microsoft Research, Redmond (2019). https://github.com/Microsoft/SEAL

  38. Tai, R.K.H., Ma, J.P.K., Zhao, Y., Chow, S.S.M.: Privacy-preserving decision trees evaluation via linear functions. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10493, pp. 494–512. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66399-9_27

    Chapter  Google Scholar 

  39. Tueno, A., Kerschbaum, F., Katzenbeisser, S.: Private evaluation of decision trees using sublinear cost. PoPETs 2019(1), 266–286 (2019)

    Google Scholar 

  40. Vaidya, J., Clifton, C., Kantarcioglu, M., Patterson, A.S.: Privacy-preserving decision trees over vertically partitioned data. ACM Trans. Knowl. Disc. Data (TKDD) 2(3), 14 (2008)

    Google Scholar 

  41. Wang, K., Xu, Y., She, R., Yu, P.S.: Classification spanning private databases. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, 16–20 July 2006, Boston, Massachusetts, USA, pp. 293–298. AAAI Press (2006)

    Google Scholar 

  42. Wu, D.J., Feng, T., Naehrig, M., Lauter, K.: Privately evaluating decision trees and random forests. Proc. Priv. Enhancing Technol. 2016(4), 335–355 (2016)

    Article  Google Scholar 

  43. Xiao, M.J., Huang, L.S., Luo, Y.L., Shen, H.: Privacy preserving ID3 algorithm over horizontally partitioned data. In: Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies, PDCAT 2005, pp. 239–243. IEEE Computer Society, Washington, DC(2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Adi Akavia or Margarita Vald .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Akavia, A., Leibovich, M., Resheff, Y.S., Ron, R., Shahar, M., Vald, M. (2021). Privacy-Preserving Decision Trees Training and Prediction. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67658-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67657-5

  • Online ISBN: 978-3-030-67658-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics