Skip to main content
Log in

Providing k-anonymity in data mining

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

In this paper we present extended definitions of k-anonymity and use them to prove that a given data mining model does not violate the k-anonymity of the individuals represented in the learning examples. Our extension provides a tool that measures the amount of anonymity retained during data mining. We show that our model can be applied to various data mining problems, such as classification, association rule mining and clustering. We describe two data mining algorithms which exploit our extension to guarantee they will generate only k-anonymous output, and provide experimental results for one of them. Finally, we show that our method contributes new and efficient ways to anonymize data and preserve patterns during anonymization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB, pp. 901–909 (2005)

  2. Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: EDBT, pp. 183–199 (2004)

  3. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Privacy Technol. (JOPT) (2005)

  4. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD’00, pp. 439–450. Dallas, Texas, (2000)

  5. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: ICDM, pp. 561–564 (2005)

  6. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: PKDD, pp. 10–21 (2005)

  7. Bayardo, R.J.: Jr. Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)

  8. Bertino, E., Chin Ooi, B., Yang, Y., Deng, R.H.: Privacy and ownership preserving of outsourced medical data. In: ICDE, pp. 521–532 (2005)

  9. Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of PODS’05, pp. 128–138. ACM Press, New York (2005)

  10. Electronic Privacy Information Center: Total "terrorism" information awareness (TIA). http://www.epic.org/privacy/ profiling/tia/.

  11. Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.: Toward privacy in public databases. In: Theory of Cryptography Conference, pp. 363–385 (2005)

  12. Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of PODS’03, pp. 202–210 (2003)

  13. Blake, C.L., Newman, D.J., Hettich, S., Merz, C.J.: UCI repository of machine learning databases (1998)

  14. Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of CRPITS’14, pp. 1–8. Australian Computer Society, Inc, Darlinghurst (2002)

  15. Dwork, C., Nissim, K.: Privacy-preserving data mining on vertically partitioned databases. In: Proceedings of CRYPTO’04 (2004)

  16. Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of PODS’03, pp. 211–222. San Diego, California, USA, 9–12 June 2003

  17. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of ACM SIGKDD’02, pp. 217–228. Canada (2002)

  18. Fung, B.C.M., Wang, K., Yu, P.S.: Top–down specialization for information and privacy preservation. In: Proceedings of ICDE’05, Tokyo (2005)

  19. Gilburd, B., Schuster, A., Wolff, R.: k-ttp: a new privacy model for large-scale distributed environments. In: Proceedings of ACM SIGKDD’04, pp. 563–568 (2004)

  20. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proc. of ACM SIGMOD’05 (2005)

  21. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of ACM SIGKDD’02, pp. 279–288 (2002)

  22. Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of DKMD’02, (2002)

  23. Kantarcioğlu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proceedings of ACM SIGKDD‘04, pp. 599–604. ACM Press, New York (2004)

  24. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceddings of ICDM’03, pp. 99. IEEE Computer Society, Washington (2003)

  25. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of , pp. 49–60. ACM Press, New York (2005)

  26. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of ICDE (2006)

  27. Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Proceedings of CRYPTO’04, pp. 36–54. Springer (2000)

  28. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: privacy beyond k-anonymity. In: Proceedings of ICDE (2006)

  29. Meyerson, A., Williams, R.: General k-anonymization is hard. In: Proceedings of PODS’04 (2003)

  30. Ross Quinlan J. (1986). Induction of decision trees. Mach. Learn. 1(1): 81–106

    Google Scholar 

  31. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans Know Data Eng. 13(6), (2001), 1041–4347. DOI 10.1109/69.971193

    Google Scholar 

  32. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Technical Report SRI-CSL-98-04. CS Laboratory, SRI International (1998)

  33. Sweeney L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5): 571–588

    Article  MATH  MathSciNet  Google Scholar 

  34. Sweeney L. (2002). k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5): 557–570

    Article  MATH  MathSciNet  Google Scholar 

  35. Zahn C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C- 20: 68–86

    Article  MATH  Google Scholar 

  36. US Dept. of HHS: Standards for privacy of individually identifiable health information; final rule (2002)

  37. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of ACM SIGKDD ‘02, Edmonton, (2002)

  38. Verykios V.S., Bertino E., Nai Fovino I., Provenza L.P., Saygin Y. and Theodoridis Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1): 50–57

    Article  Google Scholar 

  39. Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM, pp. 466–473 (2005)

  40. Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: ICDM, pp. 249–256 (2004)

  41. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd ed. Morgan Kaufmann, San Francisco, (2005)

  42. Zhong, S., Yang, Z., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SIAM International Conference on Data Mining (SDM), Newport Beach (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arik Friedman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedman, A., Wolff, R. & Schuster, A. Providing k-anonymity in data mining. The VLDB Journal 17, 789–804 (2008). https://doi.org/10.1007/s00778-006-0039-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0039-5

Keywords

Navigation