Skip to main content
Log in

Anonymity preserving pattern discovery

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k -anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database. Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats by means of pattern (not data!) distortion performed in a controlled way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909 (2005)

  2. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM PODS (2001)

  3. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD (1993)

  4. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th VLDB 1994

  5. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on Management of Data (2000)

  6. Agrawal, S., Haritsa, J.R.: A framework for high-accuracy privacy-preserving mining. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), pp. 193–204 (2005)

  7. Atallah, M., Elmagarmid, A., Ibrahim, M., Bertino, E., , V.: Disclosure limitation of sensitive rules. In: Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, p. 45. IEEE Computer Society (1999)

  8. Atzori, M.: Weak k-anonymity: a low-distortion model for protecting privacy. In: Information Security, International 8th Conference (ISC06), Proceedings, PP. 60–71 (2006)

  9. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: Proceedings of 5th IEEE International Conference on Data Mining (ICDM’05), pp. 561–564 (2005)

  10. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’05) (2005)

  11. Calders, T.: Computational complexity of itemset frequency satisfiability. In: Proceedings of PODS International Conference Principles of Database Systems (2004)

  12. Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proceedings of the 6th PKDD (2002)

  13. Chang, L., Moskowitz, I.S.: An integrated framework for database inference and privacy protection. In: Data and Applications Security (2000)

  14. Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: l4th International Conference on Parallel and Distributed Information Systems (PDIS’96) (1996)

  15. Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining privacy for data mining. In: Natural Science Foundation Workshop on Next Generation Data Mining (2002)

  16. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4(2), (2002)

  17. Dasseni, E., Verykios, V.S., Elmagarmid, A.K., Bertino, E.: Hiding association rules by using confidence and support. In: Proceedings of the 4th International Workshop on Information Hiding (2001)

  18. Du, W., Atallah, M.J. : Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms (2001)

  19. Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining (2002)

  20. Du, W., Zhan, Z.: Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003)

  21. Estivill-Castro, V., Brankovic, L.: Data swapping: Balancing privacy against precision in mining for logic rules. In: Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery (1999)

  22. Evfimievski, A.: Randomization in privacy preserving data mining. SIGKDD Explor. Newsl. 4(2), (2002)

  23. Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2003)

  24. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

  25. FIMI repository. http://fimi.cs.helsinki.fi/data/

  26. Fule, P., Roddick, J.F.: Detecting privacy and ethical sensitivity in data mining results. In: Proceedings of the 27th conference on Australasian computer science (2004)

  27. Hand D., Mannila H., Smyh P. (2001) Principles of Data Mining. MIT Press, Cambridge

    Google Scholar 

  28. Hintoglu, A.A., Inan, A., Saygin, Y., Keskinöz, M.: Suppressing data sets to prevent discovery of association rules. In: Proceedings of 5th IEEE International Conference on Data Mining (ICDM’05), pp. 645–648 (2005)

  29. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 37–48 (2005)

  30. Ioannidis, I., Grama, A., Atallah, M.: A secure protocol for computing dot-products in clustered and distributed environments. In: Proceedings of the International Conference on Parallel Processing (ICPP’02) (2002)

  31. Islam, M.Z., Brankovic, L.: A framework for privacy preserving classification in data mining. In: Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, pp. 163–168 (2004)

  32. Kacprzyk J., Cios K. (eds) (2001) Medical Data Mining and Knowledge Discovery. Physica-Verlag, Heidelberg

    Google Scholar 

  33. Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02) (2002)

  34. Kantarcioglu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD (2004)

  35. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining (2003)

  36. Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 217–228 (2006)

  37. Knuth D. (1997) Fundamental Algorithms. Addison-Wesley, Reading

    MATH  Google Scholar 

  38. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60 (2005)

  39. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In:Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), Atlanta, GA, USA (2006)

  40. Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations: extended abstract. In: Proceedings of the 2nd ACM KDD, p. 189 (1996)

  41. Muralidhar, K., Sarathy, R.: Security of random data perturbation methods. ACM Trans. Database Syst. 24(4), (1999)

  42. Øhrn A., Ohno-Machado L. (1999) Using boolean reasoning to anonymize databases. Artifi. Intell. Med. 15(3): 235–254

    Article  Google Scholar 

  43. Oliveira, S.R.M., Zaiane, O.R. : Privacy preserving frequent itemset mining. In: Proceedings of the IEEE International Conference on Privacy Security and Data mining (2002)

  44. Oliveira, S.R.M., Zaiane, O.R.: Protecting sensitive knowledge by data sanitization. In: Third IEEE International Conference on Data Mining (ICDM’03) (2003)

  45. Oliveira, S.R.M., Zaiane, O.R., Saygin, Y.: Secure association rule sharing. In: Proceedings of the 8th PAKDD (2004)

  46. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT ’99 (1999)

  47. Pei, J., Han, J., Wang, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: SIGKDD ’03 (2003)

  48. Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor. Newsl. 4(2), (2002)

  49. Rizvi, S., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of the 28th VLDB Conference (2002)

  50. Samarati P. (2001) Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. (TKDE) 13(6): 1010–1027

    Article  Google Scholar 

  51. Samarati, P., Sweeney, L.: Generalizing data to provide when disclosing information (abstract). In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (1998)

  52. Saygin, Y., Verykios, V.S., Clifton, C.: Using unknowns to prevent discovery of association rules. SIGMOD Rec. 30(4), (2001)

  53. Sun, X., Yu, P.S.: A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of 5th IEEE International Conference on Data Mining (ICDM’05), pp. 426–433 (2005)

  54. Sweeney, L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzzi. Knowl. Based Syst. 10(5), (2002)

  55. Sweeney, L.: k-Anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzzi. Knowl. Based Syst. 10(5), (2002)

  56. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

  57. Verykios V.S., Bertino E., Fovino I.N., Provenza L.P., Saygin Y., Theodoridis Y. (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1): 50–57

    Article  Google Scholar 

  58. Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: Proceedings of Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 466–473 (2005)

  59. Wu, X., Wu, Y., Wang, Y., Li, Y.: Privacy aware market basket data set generation: a feasible approach for inverse frequent set mining. In: Proceedings of 2005 SIAM International Conference on Data Mining (2005)

  60. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: 32nd Very Large Data Bases (VLDB) (2006)

  61. Zaki, M.J., Hsiao, C.-J.: Charm: an efficient algorithm for closed itemsets mining. In: 2nd SIAM International Conference on Data Mining (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Bonchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Atzori, M., Bonchi, F., Giannotti, F. et al. Anonymity preserving pattern discovery. The VLDB Journal 17, 703–727 (2008). https://doi.org/10.1007/s00778-006-0034-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0034-x

Keywords

Navigation