Skip to main content

Maximum Entropy Oriented Anonymization Algorithm for Privacy Preserving Data Mining

  • Conference paper

Abstract

This work introduces a new concept that addresses the problem of preserving privacy when anonymising and publishing personal data collections. In particular, a maximum entropy oriented algorithm to protect sensitive data is proposed. As opposed to k-anonymity, ℓ-diversity and t-closeness, the proposed algorithm builds equivalence classes with possibly uniformly distributed sensitive attribute values, probably by means of noise, and having as a lower limit the entropy of the distribution of the initial data collection, so that background information cannot be exploited to successfully attack the privacy of data subjects data refer to. Furthermore, existing privacy and information loss related metrics are presented, as well as the algorithm implementing the maximum entropy anonymity concept. From a privacy protection perspective, the achieved results are very promising, while the suffered information loss is limited.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wickramasinghe Nilmini, B.R.K., Chris, G.M., Jonathan, S.: Realizing the Knowledge Spiral in Healthcare: the role of Data Mining and Knowledge Management. The International Council on Medical & Care Compunetics, 147–162 (2008)

    Google Scholar 

  2. Dalenius, T.: Finding a Needle In a Haystack or Identifying Anonymous Census Records. Journal of Official Statistics 2(3), 329–336 (1986)

    Google Scholar 

  3. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Sweeney, L., Samarati, P.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: IEEE Symposium on Research in Security and Privacy (1998)

    Google Scholar 

  5. Meyerson, A., Williams, R.: General k-Anonymization is Hard. In: PODS 2004 (2003)

    Google Scholar 

  6. Ashwin Machanavajjhala, D.K., Gehrke, J., Venkitasubramaniam, M.: L-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data 1(1), 52, article 3 (2007)

    Google Scholar 

  7. Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy Beyond k-Anonymity and ℓ-Diversity. In: 23rd International Conference on Data Engineering, ICDE 2007, pp. 106–115 (2007)

    Google Scholar 

  8. Ye, Y., Deng, Q., Wang, C., Lv, D., Liu, Y., Feng, J.-H.: BSGI: An Effective Algorithm towards Stronger l-Diversity. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 19–32. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: 32nd International Conference on Very large Data Bases, VLDB 2006, pp. 139–150 (2006)

    Google Scholar 

  10. LeFevre, K.R., Dewitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain K-anonymity. In: International Conference on Management of Data ACM SIGMOD 2005, Baltimore, Maryland (2005)

    Google Scholar 

  11. LeFevre, K., Dewitt, D.J., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: ICDE 2006 (2006)

    Google Scholar 

  12. Iyengar, V.S.: Transforming Data to Satisfy Privacy Constrains. In: KDD 2002 (2002)

    Google Scholar 

  13. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C.: Utility-Based Anonymization Using Local Recoding. In: KDD 2006 (2006)

    Google Scholar 

  14. UCI. Irvin Machine Learning Repository, http://archive.ics.uci.edu/ml/

  15. Tsiafoulis, S.G., Zorkadis, V.C.: A Neural Network Clustering Based Algorithm for Privacy Preserving Data Mining. In: 2010 International Conference on Computational Intelligence and Security, Nanning, Guangxi Zhuang Autonomous Region, China (2010)

    Google Scholar 

  16. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21th ICDE 2005 (2005)

    Google Scholar 

  17. Webb, G.I.: Opus: An Effcient Admissible Algorithm for Unordered Search. Journal of Artificial intelligence Research 3, 431–465 (1995)

    MATH  Google Scholar 

  18. Rymon, R.: Search Through Systematic Set Enumeration (1992)

    Google Scholar 

  19. Whitley, D.: The Genitor Algorithm and Selective Pressure: Why rank-based allocation of reproductive trials is best. In: Proceedings of Third International Conference on Genetic Algorithms, pp. 116–121 (1989)

    Google Scholar 

  20. Kelly, D.J., Raines, R.A., Grimaila, M.R., Baldwin, R.O., Mullins, B.E.: A Survey of State-of-the Art ion Anonymity Metrics. In: NDA 2008. ACM, Fairfax (2008)

    Google Scholar 

  21. Dakshi Agrawal, C.C.A.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: 20th Symposium on Principles of Database Systems Santa Barbara California, USA (May 2001)

    Google Scholar 

  22. Evfimievski, A.V., Srikant, R., Gehrke, J.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems table of Contents, San Diego, California, pp. 211–222 (2003)

    Google Scholar 

  23. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 571–588 (2002)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Tsiafoulis, S.G., Zorkadis, V.C., Pimenidis, E. (2012). Maximum Entropy Oriented Anonymization Algorithm for Privacy Preserving Data Mining. In: Georgiadis, C.K., Jahankhani, H., Pimenidis, E., Bashroush, R., Al-Nemrat, A. (eds) Global Security, Safety and Sustainability & e-Democracy. e-Democracy ICGS3 2011 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 99. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33448-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33448-1_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33447-4

  • Online ISBN: 978-3-642-33448-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics