Skip to main content

Towards k-Anonymous Non-numerical Data via Semantic Resampling

  • Conference paper
Advances in Computational Intelligence (IPMU 2012)

Abstract

Privacy should be carefully considered during the publication of data (e.g. database records) collected from individuals to avoid disclosing identities or revealing confidential information. Anonymisation methods aim at achieving a certain degree of privacy by performing transformations over non-anonymous data while minimising, as much as possible, the distortion (i.e. information loss) derived from these transformations. k-anonymity is a property typically considered when masking data, stating that each record (corresponding to an individual) is indistinguishable from at least k-1 other records in the anonymised dataset. Many methods have been developed to anonymise data, but most of them are focused solely on numerical attributes. Non-numerical values (e.g. categorical attributes like job or country-of-birth or unbounded textual ones like user preferences) are more challenging because arithmetic operations cannot be applied. To properly manage and interpret this kind of data, it is required to have operators that are able to deal with data semantics. In this paper, we propose an anonymisation method based on a classic data re-sampling algorithm that guarantees the fulfilment of the k-anonymity property and is able to deal with non-numerical data from a semantic perspective. Our method has been applied to anonymise the well-known Adult Census dataset, showing that a semantic interpretation of non-numerical values better minimises the information loss of the masked data file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)

    Google Scholar 

  2. Domingo-Ferrer, J.: A Survey of Inference Control Methods for Privacy-Preserving Data Mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, vol. 34, pp. 53–80. Springer US (2008)

    Google Scholar 

  3. Heer, G.R.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Int. Seminar on Statistical Confidentiality, Eurostat, pp. 261–271 (1993)

    Google Scholar 

  4. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical Data-Oriented Microaggregation for Statistical Disclosure Control. IEEE Trans. on Knowl. and Data Eng. 14, 189–201 (2002)

    Article  Google Scholar 

  5. Herranz, J., Nin, J., Torra, V.: Distributed Privacy-Preserving Methods for Statistical Disclosure Control. In: Garcia-Alfaro, J., Navarro-Arribas, G., Cuppens-Boulahia, N., Roudier, Y. (eds.) DPM 2009. LNCS, vol. 5939, pp. 33–47. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality. The American Statistician 60, 224–232 (2006)

    Article  MathSciNet  Google Scholar 

  7. Torra, V.: Towards knowledge intensive data privacy. In: Proceedings of the 5th International Workshop on Data Privacy Management, and 3rd International Conference on Autonomous Spontaneous Security, pp. 1–7. Springer, Athens (2011)

    Chapter  Google Scholar 

  8. Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Information Fusion. Special Issue on Privacy and Security 13, 304–314 (2012)

    Google Scholar 

  9. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: The 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics, Las Cruces (1994)

    Chapter  Google Scholar 

  10. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Resampling for statistical confidentiality in contingency tables. Computers & Mathematics with Applications 38, 13–32 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  11. Jones, D.H., Adam, N.R.: Disclosure avoidance using the bootstrap and other resampling schemes. In: Proceedings of the Fifth Annual Research Conference, U.S. Bureau of the Census, pp. 446–455 (1989)

    Google Scholar 

  12. Abril, D., Navarro-Arribas, G., Torra, V.: Towards Semantic Microaggregation of Categorical Data for Confidential Documents. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Martínez, S., Sánchez, D., Valls, A.: Ontology-Based Anonymization of Categorical Values. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 243–254. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. Martínez, S., Sánchez, D., Valls, A., Batet, M.: The Role of Ontologies in the Anonymization of Textual Variables. In: Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence, pp. 153–162. IOS Press (2010)

    Google Scholar 

  16. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (1998)

    Google Scholar 

  17. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 652–659. ACM, Washington, D.C. (2004)

    Google Scholar 

  18. Hettich, S., Bay, S.D.: The UCI KDD Archive (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martínez, S., Sánchez, D., Valls, A. (2012). Towards k-Anonymous Non-numerical Data via Semantic Resampling. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds) Advances in Computational Intelligence. IPMU 2012. Communications in Computer and Information Science, vol 300. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31724-8_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31724-8_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31723-1

  • Online ISBN: 978-3-642-31724-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics