Abstract
Differentially private decision tree algorithms have been popular since the introduction of differential privacy. While many private tree-based algorithms have been proposed for supervised learning tasks, such as classification, very few extend naturally to the semi-supervised setting. In this paper, we present a framework that takes advantage of unlabelled data to reduce the noise requirement in differentially private decision forests and improves their predictive performance. The main ingredients in our approach consist of a median splitting criterion that creates balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique to compute the private splitting-point accurately. While similar ideas existed in isolation, their combination is new, and has several advantages: (1) The semi-supervised mode of operation comes for free. (2) Our framework is applicable in two different privacy settings: when label-privacy is required, and when privacy of the features is also required. (3) Empirical evidence on 18 UCI data sets and 3 synthetic data sets demonstrate that our algorithm achieves high utility performance compared to the current state of the art in both supervised and semi-supervised classification problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aryal, S., Ting, K.M., Washio, T., Haffari, G.: Data-dependent dissimilarity measure: an effective alternative to geometric distance measures. Knowl. Inf. Syst. 63, 479–506 (2017)
Beniley, J.: Multidimensional binary search trees used for associative searching. ACM Commun. 18(9), 509–517 (1975)
Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13(1), 1063–1095 (2012)
Breiman, L.: Consistency for a simple model of random forests (2004)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press (1984)
Consul, S., William, S.A.: Differentially private random forests for regression and classification. Association for the Advancement of Artificial Intelligence (2021)
Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: 2012 IEEE 28th International Conference on Data Engineering. IEEE (2012)
Dua, D., Graff, C.: UCI machine learning repository (2017)
Dwork, C.: Differential privacy. Automata, Languages and Programming, pp. 1–12 (2006)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Fletcher, S., Islam, M.Z.: A differentially private decision forest. In: Proceedings of the 13-th Australasian Data Mining Conference (2015)
Fletcher, S., Islam, M.Z.: Differentially private random decision forests using smooth sensitivity. Expert Syst. Appl. 78, 16–31 (2017)
Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 493–502 (2010)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Gong, M., Xie, Y., Pan, K., Feng, K., Qin, A.: A survey on differentially private machine learning. IEEE Comput. Intell. Mag. 15(2), 49–64 (2020)
Jagannathan, G., Monteleoni, C., Pillaipakkamnatt, K.: A semi-supervised learning approach to differential privacy. In: 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 841–848. IEEE (2013)
Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A practical differentially private random decision tree classifier. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 114–121. IEEE (2009)
Klusowski, J.: Sharp analysis of a simple model for random forests. In: International Conference on Artificial Intelligence and Statistics (2021)
Liu, X., Li, Q., Li, T., Chen, D.: Differentially private classification with decision tree ensemble. Appl. Soft Comput. 62, 807–816 (2018)
Long, X., Sakuma, J.: Differentially private semi-supervised classification. In: 2017 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–6. IEEE (2017)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS2007), pp. 94–103. IEEE (2007)
McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19–30 (2009)
Mohammed, N., Chen, R., Fung, B.C., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–501 (2011)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84 (2007)
Pham, A.T., Xi, J.: Differentially private semi-supervised learning with known class priors. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 801–810. IEEE (2018)
Rana, S., Gupta, S.K., Venkatesh, S.: Differentially private random forest with high utility. In: 2015 IEEE International Conference on Data Mining, pp. 955–960. IEEE (2015)
Xin, B., Yang, W., Wang, S., Huang, L.: Differentially private greedy decision forest. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2672–2676 (2019)
Acknowledgements
The last author is funded by EPSRC grant EP/P004245/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Z., Lei, Y., Kabán, A. (2023). Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)