Privacy-Preserving Decision Trees Training and Prediction

Akavia, Adi; Leibovich, Max; Resheff, Yehezkel S.; Ron, Roey; Shahar, Moni; Vald, Margarita

doi:10.1007/978-3-030-67658-2_9

Adi Akavia ORCID: orcid.org/0000-0003-0853-3576¹²,
Max Leibovich¹²,
Yehezkel S. Resheff¹³,
Roey Ron¹³,
Moni Shahar¹⁴ &
…
Margarita Vald^13,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12457))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1790 Accesses
5 Citations

Abstract

In the era of cloud computing and machine learning, data has become a highly valuable resource. Recent history has shown that the benefits brought forth by this data driven culture come at a cost of potential data leakage. Such breaches have a devastating impact on individuals and industry, and lead the community to seek privacy preserving solutions. A promising approach is to utilize Fully Homomorphic Encryption (\(\mathsf {FHE }\)) to enable machine learning over encrypted data, thus providing resiliency against information leakage. However, computing over encrypted data incurs a high computational overhead, thus requiring the redesign of algorithms, in an “\(\mathsf {FHE }\)-friendly” manner, to maintain their practicality.

In this work we focus on the ever-popular tree based methods (e.g., boosting, random forests), and propose a new privacy-preserving solution to training and prediction for trees. Our solution employs a low-degree approximation for the step-function together with a lightweight interactive protocol, to replace components of the vanilla algorithm that are costly over encrypted data. Our protocols for decision trees achieve practical usability demonstrated on standard UCI datasets, encrypted with fully homomorphic encryption. In addition, the communication complexity of our protocols is independent of the tree size and dataset size in prediction and training, respectively, which significantly improves on prior works.

The first author thanks the Israel Science Foundation (grant 3380/19) and Israel National Cyber Directorate via the Haifa, BIU and Tel-Aviv cyber centers for their support. The authors wish to thank Yaron Sheffer for helpful discussions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We remark that in [1] we prove privacy against malicious adversaries (stronger) and require circuit-private \(\mathsf {FHE }\). Here the adversary is semi-honest and our proof does not require circuit-privacy; our proof is the same as the proof of Theorem 2 in [1] except for encrypting random elements in \(\mathcal {M}\) rather than executing \(\mathsf {Eval}\).

References

Akavia, A., Leibovich, M., Resheff, Y.S., Ron, R., Shahar, M., Vald, M.: Privacy-preserving decision tree training and prediction against malicious server. Cryptology ePrint Archive, Report 2019/1282 (2019)
Google Scholar
Barni, M., Failla, P., Kolesnikov, V., Lazzeretti, R., Sadeghi, A.-R., Schneider, T.: Secure evaluation of private linear branching programs with medical applications. In: Backes, M., Ning, P. (eds.) ESORICS 2009. LNCS, vol. 5789, pp. 424–439. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04444-1_26
Chapter Google Scholar
Blatt, M., Gusev, A., Polyakov, Y., Rohloff, K., Vaikuntanathan, V.: Optimized homomorphic encryption solution for secure genome-wide association studies. Cryptology ePrint Archive, Report 2019/223 (2019). https://eprint.iacr.org/2019/223
Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. In: NDSS, vol. 4324, p. 4325 (2015)
Google Scholar
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_50
Chapter Google Scholar
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully homomorphic encryption without bootstrapping. In: Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, 8–10 January 2012, pp. 309–325 (2012)
Google Scholar
Brickell, J., Porter, D.E., Shmatikov, V., Witchel, E.: Privacy-preserving remote diagnostics. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 498–507. ACM (2007)
Google Scholar
Chen, H., et al.: Logistic regression over encrypted data from fully homomorphic encryption. BMC Med. Genomics 11(4), 81 (2018)
Article Google Scholar
Chen, H., et al.: Logistic regression over encrypted data from fully homomorphic encryption. BMC Med. Genomics 11, 81 (2018). https://doi.org/10.1186/s12920-018-0397-z
Article Google Scholar
Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15
Chapter Google Scholar
De Cock, M., et al.: Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Trans. Dependable Secure Comput. 16(2), 217–230 (2017)
Article Google Scholar
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Inc. (2002)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017)
Google Scholar
Emekci, F., Sahin, O.D., Agrawal, D., El Abbadi, A.: Privacy preserving decision tree learning over multiple parties. Data Knowl. Eng. 63(2), 348–361 (2007)
Article Google Scholar
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012)
Google Scholar
Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis. Stanford University (2009). https://crypto.stanford.edu/craig/
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016)
Google Scholar
Halevi, S.: Homomorphic encryption. Tutorials on the Foundations of Cryptography. ISC, pp. 219–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57048-8_5
Chapter Google Scholar
Hazay, C., Lindell, Y.: Efficient Secure Two-Party Protocols. ISC. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14303-8
Book MATH Google Scholar
Hesamifard, E., Takabi, H., Ghasemi, M., Wright, R.: Privacy-preserving machine learning as a service. In: Proceedings on Privacy Enhancing Technologies 2018, pp. 123–142 (06 2018)
Google Scholar
de Hoogh, S., Schoenmakers, B., Chen, P., op den Akker, H.: Practical secure decision tree learning in a teletreatment application. In: Christin, N., Safavi-Naini, R. (eds.) FC 2014. LNCS, vol. 8437, pp. 179–194. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45472-5_12
Chapter Google Scholar
Joye, M., Salehi, F.: Private yet efficient decision tree evaluation. In: Kerschbaum, F., Paraboschi, S. (eds.) DBSec 2018. LNCS, vol. 10980, pp. 243–259. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95729-6_16
Chapter Google Scholar
Katz, J., Lindell, Y.: Introduction to Modern Cryptography. Chapman & Hall/CRC Cryptography and Network Security Series. Chapman & Hall/CRC, Boca Raton (2007)
Book Google Scholar
Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.: Logistic regression model training based on the approximate homomorphic encryption. BMC Med. Genomics 11, 23–31 (2018)
Article Google Scholar
Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.H.: Logistic regression model training based on the approximate homomorphic encryption. BMC Med. Genomics 11(4), 83 (2018)
Article Google Scholar
Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption: design and evaluation. JMIR Med. Inf. 6, e19 (2017)
Article Google Scholar
Kiss, Á., Naderpour, M., Liu, J., Asokan, N., Schneider, T.: SoK: modular and efficient private decision tree evaluation. PoPETs 2019(2), 187–208 (2019)
Google Scholar
Kyoohyung, H., Hong, S., Cheon, J., Park, D.: Logistic regression on homomorphic encrypted data at scale. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9466–9471, July 2019
Google Scholar
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3
Chapter Google Scholar
Lory, P.: Enhancing the efficiency in privacy preserving learning of decision trees in partitioned databases. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 322–335. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33627-0_25
Chapter Google Scholar
Nandakumar, K., Ratha, N.K., Pankanti, S., Halevi, S.: Towards deep neural network training on encrypted data. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, 16–20 June 2019. p. 0. Computer Vision Foundation/IEEE (2019)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Remez, E.Y.: Sur la détermination des polynômes d’approximation de degré donnée. Comm. Soc. Math. Kharkov 10(4163), 196 (1934)
Google Scholar
Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Sec. Comput. 4, 169–179 (1978)
MathSciNet Google Scholar
Rivlin, T.J.: An Introduction to the Approximation of Functions. Courier Corporation, North Chelmsford (2003)
Google Scholar
Samet, S., Miri, A.: Privacy preserving ID3 using Gini index over horizontally partitioned data. In: Proceedings of the 2008 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008, pp. 645–651. IEEE Computer Society, Washington, DC (2008)
Google Scholar
Microsoft SEAL (release 3.3). Microsoft Research, Redmond (2019). https://github.com/Microsoft/SEAL
Tai, R.K.H., Ma, J.P.K., Zhao, Y., Chow, S.S.M.: Privacy-preserving decision trees evaluation via linear functions. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10493, pp. 494–512. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66399-9_27
Chapter Google Scholar
Tueno, A., Kerschbaum, F., Katzenbeisser, S.: Private evaluation of decision trees using sublinear cost. PoPETs 2019(1), 266–286 (2019)
Google Scholar
Vaidya, J., Clifton, C., Kantarcioglu, M., Patterson, A.S.: Privacy-preserving decision trees over vertically partitioned data. ACM Trans. Knowl. Disc. Data (TKDD) 2(3), 14 (2008)
Google Scholar
Wang, K., Xu, Y., She, R., Yu, P.S.: Classification spanning private databases. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, 16–20 July 2006, Boston, Massachusetts, USA, pp. 293–298. AAAI Press (2006)
Google Scholar
Wu, D.J., Feng, T., Naehrig, M., Lauter, K.: Privately evaluating decision trees and random forests. Proc. Priv. Enhancing Technol. 2016(4), 335–355 (2016)
Article Google Scholar
Xiao, M.J., Huang, L.S., Luo, Y.L., Shen, H.: Privacy preserving ID3 algorithm over horizontally partitioned data. In: Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies, PDCAT 2005, pp. 239–243. IEEE Computer Society, Washington, DC(2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Haifa, Haifa, Israel
Adi Akavia & Max Leibovich
Intuit Inc., Petah Tikva, Israel
Yehezkel S. Resheff, Roey Ron & Margarita Vald
Facebook Inc., Tel Aviv-Yafo, Israel
Moni Shahar
Tel Aviv University, Tel Aviv-Yafo, Israel
Margarita Vald

Authors

Adi Akavia
View author publications
You can also search for this author in PubMed Google Scholar
Max Leibovich
View author publications
You can also search for this author in PubMed Google Scholar
Yehezkel S. Resheff
View author publications
You can also search for this author in PubMed Google Scholar
Roey Ron
View author publications
You can also search for this author in PubMed Google Scholar
Moni Shahar
View author publications
You can also search for this author in PubMed Google Scholar
Margarita Vald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Adi Akavia or Margarita Vald .

Editor information

Editors and Affiliations

Albert-Ludwigs-Universität, Freiburg, Germany
Frank Hutter
TU Darmstadt, Darmstadt, Germany
Kristian Kersting
Ghent University, Ghent, Belgium
Jefrey Lijffijt
Saarland University, Saarbrücken, Germany
Isabel Valera

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akavia, A., Leibovich, M., Resheff, Y.S., Ron, R., Shahar, M., Vald, M. (2021). Privacy-Preserving Decision Trees Training and Prediction. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-67658-2_9
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67657-5
Online ISBN: 978-3-030-67658-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)