Skip to main content
Log in

Embedding differential privacy in decision tree algorithm with different depths

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dwork C. Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, Venice, 2006. 1–12

    MATH  Google Scholar 

  2. Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzz, 2002, 10: 571–588

    Article  MathSciNet  MATH  Google Scholar 

  3. Domingo-Ferrer J, Torra V. A critique of k-anonymity and some of its enhancements. In: Proceedings of the 3rd International Conference on Availability, Reliability and Security. Washington, DC: IEEE, 2008. 990–993

    Google Scholar 

  4. Hu X Y, Yuan M Y, Yao J G, et al. Differential privacy in telco big data platform. In: Proceedings of the 41st International Conference on Very Large Data Bases Endowment, Kohala Coast, 2015. 1692–1703

    Google Scholar 

  5. McSherry F D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Rhode Island, 2009. 19–30

    Google Scholar 

  6. Xiao Q, Chen R, Tan K-L. Differentially private network data release via structural inference. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 911–920

    Google Scholar 

  7. Chen R, Xiao Q, Zhang Y, et al. Differentially private high-dimensional data publication via sampling-based inference. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 129–138

    Google Scholar 

  8. Li H T, Ma J F, Fu S. A privacy-preserving data collection model for digital community. Sci China Inf Sci, 2015, 58: 032101

    Google Scholar 

  9. Huang X Z, Liu J Q, Han Z, et al. Privacy beyond sensitive values. Sci China Inf Sci, 2015, 58: 072106

    MathSciNet  Google Scholar 

  10. Dwork C, Mcsherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography, New York, 2006. 265–284

    MATH  Google Scholar 

  11. Dwork C. A firm foundation for private data analysis. Commun ACM, 2011, 54: 86–95

    Article  Google Scholar 

  12. Blum A, Dwork C, McSherry F, et al. Practical privacy: the SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, 2005. 128–138

    Google Scholar 

  13. Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, Vancouver, 2008. 289–296

    Google Scholar 

  14. Friedman A, Schuster A. Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 493–502

    Google Scholar 

  15. Erlingsson U, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, 2014. 1054–1067

    Google Scholar 

  16. Wang L W, Zhang J P. On the measurement complexity of differentially private query answering. Sci China Inf Sci, 2015, 58: 092112

    MathSciNet  Google Scholar 

  17. Li N H, Qardaji W, Su D, et al. PrivBasis: frequent itemset mining with differential privacy. Proc VLDB Endowment, 2012, 5: 1340–1351

    Article  Google Scholar 

  18. Hien T, Gabriel G, Cyrus S. A framework for protecting worker location privacy in spatial crowdsourcing. Proc VLDB Endowment, 2014, 7: 919–930

    Article  Google Scholar 

  19. Li N H, Yang W N, Qardaji W. Differentially private grids for geospatial data. In: Proceedings of the 2013 IEEE International Conference on Data Engineering. Washington DC: IEEE, 2013. 757–768

    Google Scholar 

  20. Machanavajjhala A, Korolova A, Sarma A D. Personalized social recommendations: accurate or private. Proc VLDB Endowment, 2011, 4: 440–450

    Article  Google Scholar 

  21. Mohammed N, Chen R, Fung B C M, et al. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 493–501

    Google Scholar 

  22. Shen E T, Yu T. Mining frequent graph patterns with differential privacy. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 545–553

    Google Scholar 

  23. Clauset A, Moore C, Newman M E J. Hierarchical structure and the prediction of missing links in networks. Nature, 2008, 453: 98–101

    Article  Google Scholar 

  24. Clauset A, Moore C, Newman M E J. Structural inference of hierarchies in networks. In: Proceedings of the 2006 International Conference on Machine Learning on Statistical Network Analysis, Pittsburgh, 2006. 1–13

    Google Scholar 

  25. Jagannathan G, Pillaipakkamnatt K, Wright R N. A practical differentially private random decision tree classifier. Trans Data Privacy, 2009, 5: 114–121

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61525204, 61572322), Science and Technology Commission of Shanghai Municipality Project (Grant Nos. 14510722600, 16QA1402200), Aeronautical Science Foundation of China (Grant No. 20145557010), and NRF Singapore CREATE Program E2S2.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jianguo Yao or Haibing Guan.

Additional information

Conflict of interest The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, X., Yao, J., Yuan, M. et al. Embedding differential privacy in decision tree algorithm with different depths. Sci. China Inf. Sci. 60, 082104 (2017). https://doi.org/10.1007/s11432-016-0442-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-016-0442-1

Keywords

Navigation