Abstract
In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3 decision tree algorithm and association rule mining problem and we also present experimental results.
Similar content being viewed by others
References
Agrawal D, Aggarwal C (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on management of data, Dallas, TX, USA, May 15–18, 2000
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving data mining. SIGKDD Explorations, December 2002
Du W, Zhan Z (2002) Building decision tree classifier on private data. Workshop on privacy, security, and data mining at the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi City, Japan, December 9
Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA
Evfimievski A, Srikant R, Agrawal R, and Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada
Goldwasser S (1997) Multi-party computations: past and present. In: Proceedings of the 16th annual ACM symposium on principles of distributed computing, Santa Barbara, CA, USA, August 21–24, 1997
Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann Publishers, San Francisco
Kantarcioglu M, Jin J, and Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), Seattle, WA, USA
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, USA, November 19–22, 2003
Lindell Y, Pinkas B (2000) Privacy preserving data mining. Advances in Cryptology—Crypto2000. Lecture Notes in Computer Science, vol 1880
Meng D, Sivakumar K, and Kargupta H (2004) Privacy sensitive bayesian network parameter learning. In: Proceedings of the fourth IEEE International conference on data mining (ICDM), Brighton, UK
Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor 4(2): 12–19
Rizvi S, Haritsa J (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th VLDB conference, Hong Kong, China
Sanil A, Karr A, Lin X, Reiter J (2004) Privacy preserving regression modelling via distributed computation. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, USA
Subramaniam H, Wright R, Yang Z (2004) Experimental analysis of privacy-preserving statistics computation. In: Proceedings of the workshop on secure data management (held in conjunction with VLDB’04). LNCS, vol 3178. Springer, Heidelberg
Teng Z, Du W (2007) A hybrid multi-group approach for privacy preserving decision tree building. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2007)
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, July 23–26
Vaidya J, Clifton C (2003) Privacy-preserving K-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining August 24–27
Vaidya J, Yu H, Jiang X (2007) Privacy-preserving svm classification. J Knowl Inf Syst 14(2): 161–178
Wang K, Fung B, Yu P (2007) Handicapping Attacker’s confidence: an alternative to k-anonymization. J Knowl Inf Syst 11(3): 345–368
Wang K, Yu P, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the fourth IEEE international conference on data mining (ICDM), Brighton, UK
Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309): 63–69
Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, USA
Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. J Knowl Inf Syst 10(3): 383–397
Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is the extended version of the paper [17].
Rights and permissions
About this article
Cite this article
Teng, Z., Du, W. A hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19, 133–157 (2009). https://doi.org/10.1007/s10115-008-0158-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0158-y