Skip to main content
Log in

A hybrid multi-group approach for privacy-preserving data mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3 decision tree algorithm and association rule mining problem and we also present experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal D, Aggarwal C (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems

  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on management of data, Dallas, TX, USA, May 15–18, 2000

  3. Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving data mining. SIGKDD Explorations, December 2002

  4. Du W, Zhan Z (2002) Building decision tree classifier on private data. Workshop on privacy, security, and data mining at the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi City, Japan, December 9

  5. Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA

  6. Evfimievski A, Srikant R, Agrawal R, and Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada

  7. Goldwasser S (1997) Multi-party computations: past and present. In: Proceedings of the 16th annual ACM symposium on principles of distributed computing, Santa Barbara, CA, USA, August 21–24, 1997

  8. Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann Publishers, San Francisco

  9. Kantarcioglu M, Jin J, and Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), Seattle, WA, USA

  10. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, USA, November 19–22, 2003

  11. Lindell Y, Pinkas B (2000) Privacy preserving data mining. Advances in Cryptology—Crypto2000. Lecture Notes in Computer Science, vol 1880

  12. Meng D, Sivakumar K, and Kargupta H (2004) Privacy sensitive bayesian network parameter learning. In: Proceedings of the fourth IEEE International conference on data mining (ICDM), Brighton, UK

  13. Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor 4(2): 12–19

    Article  Google Scholar 

  14. Rizvi S, Haritsa J (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th VLDB conference, Hong Kong, China

  15. Sanil A, Karr A, Lin X, Reiter J (2004) Privacy preserving regression modelling via distributed computation. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, USA

  16. Subramaniam H, Wright R, Yang Z (2004) Experimental analysis of privacy-preserving statistics computation. In: Proceedings of the workshop on secure data management (held in conjunction with VLDB’04). LNCS, vol 3178. Springer, Heidelberg

  17. Teng Z, Du W (2007) A hybrid multi-group approach for privacy preserving decision tree building. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2007)

  18. Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, July 23–26

  19. Vaidya J, Clifton C (2003) Privacy-preserving K-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining August 24–27

  20. Vaidya J, Yu H, Jiang X (2007) Privacy-preserving svm classification. J Knowl Inf Syst 14(2): 161–178

    Article  Google Scholar 

  21. Wang K, Fung B, Yu P (2007) Handicapping Attacker’s confidence: an alternative to k-anonymization. J Knowl Inf Syst 11(3): 345–368

    Article  Google Scholar 

  22. Wang K, Yu P, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the fourth IEEE international conference on data mining (ICDM), Brighton, UK

  23. Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309): 63–69

    Article  Google Scholar 

  24. Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, USA

  25. Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. J Knowl Inf Syst 10(3): 383–397

    Article  Google Scholar 

  26. Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhouxuan Teng.

Additional information

This paper is the extended version of the paper [17].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, Z., Du, W. A hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19, 133–157 (2009). https://doi.org/10.1007/s10115-008-0158-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0158-y

Keywords

Navigation