Privacy-Preserving Data Mining through Knowledge Model Sharing

Sharkey, Patrick; Tian, Hongwei; Zhang, Weining; Xu, Shouhuai

doi:10.1007/978-3-540-78478-4_6

Privacy-Preserving Data Mining through Knowledge Model Sharing

Patrick Sharkey¹,
Hongwei Tian¹,
Weining Zhang¹ &
…
Shouhuai Xu¹

Conference paper

643 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4890))

Abstract

Privacy-preserving data mining (PPDM) is an important topic to both industry and academia. In general there are two approaches to tackling PPDM, one is statistics-based and the other is crypto-based. The statistics-based approach has the advantage of being efficient enough to deal with large volume of datasets. The basic idea underlying this approach is to let the data owners publish some sanitized versions of their data (e.g., via perturbation, generalization, or ℓ-diversification), which are then used for extracting useful knowledge models such as decision trees. In this paper, we present a new method for statistics-based PPDM. Our method differs from the existing ones because it lets the data owners share with each other the knowledge models extracted from their own private datasets, rather than to let the data owners publish any of their own private datasets (not even in any sanitized form). The knowledge models derived from the individual datasets are used to generate some pseudo-data that are then used for extracting the desired “global” knowledge models. While instrumental, there are some technical subtleties that need be carefully addressed. Specifically, we propose an algorithm for generating pseudo-data according to paths of a decision tree, a method for adapting anonymity measures of datasets to measure the privacy of decision trees, and an algorithm that prunes a decision tree to satisfy a given anonymity requirement. Through an empirical study, we show that predictive models learned using our method are significantly more accurate than those learned using the existing ℓ-diversity method in both centralized and distributed environments with different types of datasets, predictive models, and utility measures.

This work was supported in part by NSF reseach grant IIS-0524612.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lindell, Y., Pinkas, B.: Privacy Preserving Data Mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Chapter Google Scholar
Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explorations 4(2), 12–19 (2003)
Article Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: ACM SIGMOD International Conference on Management of Data, pp. 439–450. ACM, New York (2000)
Chapter Google Scholar
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaching in privacy preserving data mining. In: ACM Symposium on Principles of Database Systems, pp. 211–222. ACM, New York (2003)
Google Scholar
Dowd, J., Xu, S., Zhang, W.: Privacy-preserving decision tree mining based on random substitutions. In: International Conference on Emerging Trends in Information and Communication Security, Freiburg, Germany (June 2006)
Google Scholar
Agrawal, S., Haritsa, J.R.: A framework for high-accuracy privacy-preserving mining. In: IEEE International Conference on Data Engineering (2005)
Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proceeding of the 25th ACM Symposium on Principles of Database Systems (June 2006)
Google Scholar
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: International Conference on Very Large Data Bases, pp. 901–909 (2005)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. In: IEEE International Conference on Data Engineering (2006)
Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proc. of the IEEE Symposium on Research in Security and Privacy (1998)
Google Scholar
Xu, S., Zhang, W.: PBKM: A secure knowledge management framework (extended abstract). In: NSF/NSA/AFRL Workshop on Secure Knowledge Management, Buffalo, NY (2004)
Google Scholar
Xu, S., Zhang, W.: Knowledge as service and knowledge breaching. In: IEEE International Conference on Service Computing (SCC 2005) (2005)
Google Scholar
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: IEEE International Conference on Data Mining (2003)
Google Scholar
Xiao, X., Tao, Y.: Personalized privacy preservation. In: ACM SIGMOD International Conference on Management of Data, pp. 229–240 (2006)
Google Scholar
Evmievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: International Conference on Knowledge Discovery and Data Mining (2002)
Google Scholar
Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: International Conference on Very Large Data Bases (2002)
Google Scholar
Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: IEEE International Conference on Data Mining (2003)
Google Scholar
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: A data mining solution to privacy protection. In: IEEE International Conference on Data Mining (2004)
Google Scholar
Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specification for informaiton and privacy preservation. In: IEEE International Conference on Data Engineering (2005)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient fulldomain k-anonymity. In: ACM SIGMOD International Conference on Management of Data (2005)
Google Scholar
Kifer, D., Gehrke, J.E.: Injecting utility into anonymized datasets. In: ACM SIGMOD International Conference on Management of Data (2006)
Google Scholar
Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K. (α, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: International Conference on Knowledge Discovery and Data Mining (2006)
Google Scholar
Aggarwal, C., Yu, P.: A condensation approach to privacy preserving data mining. In: International Conference on Extending Database Technology, pp. 183–199 (2004)
Google Scholar
Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications tohandwriting recognition. IEEE Transactions on Systems, Man and Cybernetics 22(3), 418–435 (1992)
Article Google Scholar
Woods, K., Kegelmeyer Jr., W.P., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 405–410 (1997)
Article Google Scholar
Oliveira, S., Zaane, O., Saygin, Y.: Secure Association Rule Sharing. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 74–85. Springer, Heidelberg (2004)
Google Scholar
Wang, Z., Wang, W., Shi, B.: Blocking inference channels in frequent pattern sharing. In: IEEE International Conference on Data Engineering, pp. 1425–1429 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at San Antonio,
Patrick Sharkey, Hongwei Tian, Weining Zhang & Shouhuai Xu

Authors

Patrick Sharkey
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Weining Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shouhuai Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Francesco Bonchi Elena Ferrari Bradley Malin Yücel Saygin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharkey, P., Tian, H., Zhang, W., Xu, S. (2008). Privacy-Preserving Data Mining through Knowledge Model Sharing. In: Bonchi, F., Ferrari, E., Malin, B., Saygin, Y. (eds) Privacy, Security, and Trust in KDD. PInKDD 2007. Lecture Notes in Computer Science, vol 4890. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78478-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-78478-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78477-7
Online ISBN: 978-3-540-78478-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics