A hybrid multi-group approach for privacy-preserving data mining

Teng, Zhouxuan; Du, Wenliang

doi:10.1007/s10115-008-0158-y

A hybrid multi-group approach for privacy-preserving data mining

Regular Paper
Published: 26 August 2008

Volume 19, pages 133–157, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Zhouxuan Teng¹ &
Wenliang Du¹

190 Accesses
15 Citations
Explore all metrics

Abstract

In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3 decision tree algorithm and association rule mining problem and we also present experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal D, Aggarwal C (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on management of data, Dallas, TX, USA, May 15–18, 2000
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving data mining. SIGKDD Explorations, December 2002
Du W, Zhan Z (2002) Building decision tree classifier on private data. Workshop on privacy, security, and data mining at the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi City, Japan, December 9
Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA
Evfimievski A, Srikant R, Agrawal R, and Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada
Goldwasser S (1997) Multi-party computations: past and present. In: Proceedings of the 16th annual ACM symposium on principles of distributed computing, Santa Barbara, CA, USA, August 21–24, 1997
Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann Publishers, San Francisco
Kantarcioglu M, Jin J, and Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), Seattle, WA, USA
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, USA, November 19–22, 2003
Lindell Y, Pinkas B (2000) Privacy preserving data mining. Advances in Cryptology—Crypto2000. Lecture Notes in Computer Science, vol 1880
Meng D, Sivakumar K, and Kargupta H (2004) Privacy sensitive bayesian network parameter learning. In: Proceedings of the fourth IEEE International conference on data mining (ICDM), Brighton, UK
Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor 4(2): 12–19
Article Google Scholar
Rizvi S, Haritsa J (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th VLDB conference, Hong Kong, China
Sanil A, Karr A, Lin X, Reiter J (2004) Privacy preserving regression modelling via distributed computation. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, USA
Subramaniam H, Wright R, Yang Z (2004) Experimental analysis of privacy-preserving statistics computation. In: Proceedings of the workshop on secure data management (held in conjunction with VLDB’04). LNCS, vol 3178. Springer, Heidelberg
Teng Z, Du W (2007) A hybrid multi-group approach for privacy preserving decision tree building. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2007)
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, July 23–26
Vaidya J, Clifton C (2003) Privacy-preserving K-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining August 24–27
Vaidya J, Yu H, Jiang X (2007) Privacy-preserving svm classification. J Knowl Inf Syst 14(2): 161–178
Article Google Scholar
Wang K, Fung B, Yu P (2007) Handicapping Attacker’s confidence: an alternative to k-anonymization. J Knowl Inf Syst 11(3): 345–368
Article Google Scholar
Wang K, Yu P, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the fourth IEEE international conference on data mining (ICDM), Brighton, UK
Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309): 63–69
Article Google Scholar
Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, USA
Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. J Knowl Inf Syst 10(3): 383–397
Article Google Scholar
Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Syracuse University, 121 Link Hall, Syracuse, NY, 13244, USA
Zhouxuan Teng & Wenliang Du

Authors

Zhouxuan Teng
View author publications
You can also search for this author in PubMed Google Scholar
Wenliang Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhouxuan Teng.

Additional information

This paper is the extended version of the paper [17].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, Z., Du, W. A hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19, 133–157 (2009). https://doi.org/10.1007/s10115-008-0158-y

Download citation

Received: 05 February 2008
Revised: 25 May 2008
Accepted: 26 May 2008
Published: 26 August 2008
Issue Date: May 2009
DOI: https://doi.org/10.1007/s10115-008-0158-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid multi-group approach for privacy-preserving data mining

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big healthcare data: preserving security and privacy

Uncertainty in big data analytics: survey, opportunities, and challenges

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid multi-group approach for privacy-preserving data mining

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big healthcare data: preserving security and privacy

Uncertainty in big data analytics: survey, opportunities, and challenges

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation