Providing k-anonymity in data mining

Friedman, Arik; Wolff, Ran; Schuster, Assaf

doi:10.1007/s00778-006-0039-5

Providing k-anonymity in data mining

Regular Paper
Published: 10 January 2007

Volume 17, pages 789–804, (2008)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Arik Friedman¹,
Ran Wolff¹ &
Assaf Schuster²

549 Accesses
70 Citations
Explore all metrics

Abstract

In this paper we present extended definitions of k-anonymity and use them to prove that a given data mining model does not violate the k-anonymity of the individuals represented in the learning examples. Our extension provides a tool that measures the amount of anonymity retained during data mining. We show that our model can be applied to various data mining problems, such as classification, association rule mining and clustering. We describe two data mining algorithms which exploit our extension to guarantee they will generate only k-anonymous output, and provide experimental results for one of them. Finally, we show that our method contributes new and efficient ways to anonymize data and preserve patterns during anonymization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB, pp. 901–909 (2005)
Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: EDBT, pp. 183–199 (2004)
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Privacy Technol. (JOPT) (2005)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD’00, pp. 439–450. Dallas, Texas, (2000)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: ICDM, pp. 561–564 (2005)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: PKDD, pp. 10–21 (2005)
Bayardo, R.J.: Jr. Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)
Bertino, E., Chin Ooi, B., Yang, Y., Deng, R.H.: Privacy and ownership preserving of outsourced medical data. In: ICDE, pp. 521–532 (2005)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of PODS’05, pp. 128–138. ACM Press, New York (2005)
Electronic Privacy Information Center: Total "terrorism" information awareness (TIA). http://www.epic.org/privacy/ profiling/tia/.
Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.: Toward privacy in public databases. In: Theory of Cryptography Conference, pp. 363–385 (2005)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of PODS’03, pp. 202–210 (2003)
Blake, C.L., Newman, D.J., Hettich, S., Merz, C.J.: UCI repository of machine learning databases (1998)
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of CRPITS’14, pp. 1–8. Australian Computer Society, Inc, Darlinghurst (2002)
Dwork, C., Nissim, K.: Privacy-preserving data mining on vertically partitioned databases. In: Proceedings of CRYPTO’04 (2004)
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of PODS’03, pp. 211–222. San Diego, California, USA, 9–12 June 2003
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of ACM SIGKDD’02, pp. 217–228. Canada (2002)
Fung, B.C.M., Wang, K., Yu, P.S.: Top–down specialization for information and privacy preservation. In: Proceedings of ICDE’05, Tokyo (2005)
Gilburd, B., Schuster, A., Wolff, R.: k-ttp: a new privacy model for large-scale distributed environments. In: Proceedings of ACM SIGKDD’04, pp. 563–568 (2004)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proc. of ACM SIGMOD’05 (2005)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of ACM SIGKDD’02, pp. 279–288 (2002)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of DKMD’02, (2002)
Kantarcioğlu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proceedings of ACM SIGKDD‘04, pp. 599–604. ACM Press, New York (2004)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceddings of ICDM’03, pp. 99. IEEE Computer Society, Washington (2003)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of , pp. 49–60. ACM Press, New York (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of ICDE (2006)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Proceedings of CRYPTO’04, pp. 36–54. Springer (2000)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: privacy beyond k-anonymity. In: Proceedings of ICDE (2006)
Meyerson, A., Williams, R.: General k-anonymization is hard. In: Proceedings of PODS’04 (2003)
Ross Quinlan J. (1986). Induction of decision trees. Mach. Learn. 1(1): 81–106
Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans Know Data Eng. 13(6), (2001), 1041–4347. DOI 10.1109/69.971193
Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Technical Report SRI-CSL-98-04. CS Laboratory, SRI International (1998)
Sweeney L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5): 571–588
Article MATH MathSciNet Google Scholar
Sweeney L. (2002). k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5): 557–570
Article MATH MathSciNet Google Scholar
Zahn C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C- 20: 68–86
Article MATH Google Scholar
US Dept. of HHS: Standards for privacy of individually identifiable health information; final rule (2002)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of ACM SIGKDD ‘02, Edmonton, (2002)
Verykios V.S., Bertino E., Nai Fovino I., Provenza L.P., Saygin Y. and Theodoridis Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1): 50–57
Article Google Scholar
Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM, pp. 466–473 (2005)
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: ICDM, pp. 249–256 (2004)
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd ed. Morgan Kaufmann, San Francisco, (2005)
Zhong, S., Yang, Z., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SIAM International Conference on Data Mining (SDM), Newport Beach (2005)

Download references

Author information

Authors and Affiliations

Computer Science Department, Technion—Israel Institute of Technology, Haifa, Israel
Arik Friedman & Ran Wolff
Management Information Systems Department, Haifa University, Haifa, Israel
Assaf Schuster

Authors

Arik Friedman
View author publications
You can also search for this author in PubMed Google Scholar
Ran Wolff
View author publications
You can also search for this author in PubMed Google Scholar
Assaf Schuster
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arik Friedman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedman, A., Wolff, R. & Schuster, A. Providing k-anonymity in data mining. The VLDB Journal 17, 789–804 (2008). https://doi.org/10.1007/s00778-006-0039-5

Download citation

Received: 30 September 2005
Revised: 24 May 2006
Accepted: 02 August 2006
Published: 10 January 2007
Issue Date: July 2008
DOI: https://doi.org/10.1007/s00778-006-0039-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Providing k-anonymity in data mining

Abstract

Access this article

Similar content being viewed by others

K-Anonymity Algorithm Based on Improved Clustering

Data Anonymization Through Multi-modular Clustering

An Efficient k-Anonymization Algorithm with Low Information Loss

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Providing k-anonymity in data mining

Abstract

Access this article

Similar content being viewed by others

K-Anonymity Algorithm Based on Improved Clustering

Data Anonymization Through Multi-modular Clustering

An Efficient k-Anonymization Algorithm with Low Information Loss

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation