Abstract
Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.
Similar content being viewed by others
References
Dwork C. Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, Venice, 2006. 1–12
Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzz, 2002, 10: 571–588
Domingo-Ferrer J, Torra V. A critique of k-anonymity and some of its enhancements. In: Proceedings of the 3rd International Conference on Availability, Reliability and Security. Washington, DC: IEEE, 2008. 990–993
Hu X Y, Yuan M Y, Yao J G, et al. Differential privacy in telco big data platform. In: Proceedings of the 41st International Conference on Very Large Data Bases Endowment, Kohala Coast, 2015. 1692–1703
McSherry F D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Rhode Island, 2009. 19–30
Xiao Q, Chen R, Tan K-L. Differentially private network data release via structural inference. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 911–920
Chen R, Xiao Q, Zhang Y, et al. Differentially private high-dimensional data publication via sampling-based inference. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 129–138
Li H T, Ma J F, Fu S. A privacy-preserving data collection model for digital community. Sci China Inf Sci, 2015, 58: 032101
Huang X Z, Liu J Q, Han Z, et al. Privacy beyond sensitive values. Sci China Inf Sci, 2015, 58: 072106
Dwork C, Mcsherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography, New York, 2006. 265–284
Dwork C. A firm foundation for private data analysis. Commun ACM, 2011, 54: 86–95
Blum A, Dwork C, McSherry F, et al. Practical privacy: the SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, 2005. 128–138
Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, Vancouver, 2008. 289–296
Friedman A, Schuster A. Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 493–502
Erlingsson U, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, 2014. 1054–1067
Wang L W, Zhang J P. On the measurement complexity of differentially private query answering. Sci China Inf Sci, 2015, 58: 092112
Li N H, Qardaji W, Su D, et al. PrivBasis: frequent itemset mining with differential privacy. Proc VLDB Endowment, 2012, 5: 1340–1351
Hien T, Gabriel G, Cyrus S. A framework for protecting worker location privacy in spatial crowdsourcing. Proc VLDB Endowment, 2014, 7: 919–930
Li N H, Yang W N, Qardaji W. Differentially private grids for geospatial data. In: Proceedings of the 2013 IEEE International Conference on Data Engineering. Washington DC: IEEE, 2013. 757–768
Machanavajjhala A, Korolova A, Sarma A D. Personalized social recommendations: accurate or private. Proc VLDB Endowment, 2011, 4: 440–450
Mohammed N, Chen R, Fung B C M, et al. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 493–501
Shen E T, Yu T. Mining frequent graph patterns with differential privacy. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 545–553
Clauset A, Moore C, Newman M E J. Hierarchical structure and the prediction of missing links in networks. Nature, 2008, 453: 98–101
Clauset A, Moore C, Newman M E J. Structural inference of hierarchies in networks. In: Proceedings of the 2006 International Conference on Machine Learning on Statistical Network Analysis, Pittsburgh, 2006. 1–13
Jagannathan G, Pillaipakkamnatt K, Wright R N. A practical differentially private random decision tree classifier. Trans Data Privacy, 2009, 5: 114–121
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61525204, 61572322), Science and Technology Commission of Shanghai Municipality Project (Grant Nos. 14510722600, 16QA1402200), Aeronautical Science Foundation of China (Grant No. 20145557010), and NRF Singapore CREATE Program E2S2.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Conflict of interest The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Bai, X., Yao, J., Yuan, M. et al. Embedding differential privacy in decision tree algorithm with different depths. Sci. China Inf. Sci. 60, 082104 (2017). https://doi.org/10.1007/s11432-016-0442-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-016-0442-1