Skip to main content

Semantic Anonymisation of Categorical Datasets

  • Chapter
  • First Online:
Book cover Advanced Research in Data Privacy

Part of the book series: Studies in Computational Intelligence ((SCI,volume 567))

Abstract

The exploitation of microdata compiled by statistical agencies is of great interest for the data mining community. However, such data often include sensitive information that can be directly or indirectly related to individuals. Hence, an appropriate anonymisation process is needed to minimise the risk of disclosing identities and/or confidential data. In the past, many anonymisation methods have been developed to deal with numerical data, but approaches tackling the anonymisation of non-numerical values (e.g. categorical, textual) are scarce and shallow. Since the utility of this kind of information is closely related to the preservation of its meaning, in this work, the notion of semantic similarity is used to enable a semantically coherent anonymisation. The knowledge modelled in ontologies is used as the basic pillar to propose semantic operators that enable an accurate management and transformation of categorical attributes. These operators are then used in three anonymisation mechanisms: Semantic Recoding, Semantic and Adaptive Microaggregation and Semantic Resampling. The three algorithms are compared in terms of semantic utility, privacy disclosure risk and runtime, with encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Willenborg, L., de Waal, T.: Elements of Statistical Diclosure Control. Lecture Notes in Statistics, vol. 155. p. 261. Springer, New York (261)

    Google Scholar 

  2. Domingo-Ferrer, J.: A Survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, pp. 53–80. Springer, US (2008)

    Google Scholar 

  3. Jin, X., Zhang, N., Das, G.: ASAP: eliminating algorithm-based disclosure in privacy-preserving data publishing. Inf. Syst. 36(5), 859–880 (2011)

    Article  Google Scholar 

  4. Herranz, J., et al.: Classifying data from protected statistical datasets. Comput. Secur. 29(8), 875–890 (2010)

    Article  Google Scholar 

  5. Oliveira, S.R.M., Zaïane, O.R.: A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Comput. Secur. 26(1), 81–93 (2007)

    Article  Google Scholar 

  6. Shin, H., Vaidya, J., Atluri, V.: Anonymization models for directional location based service environments. Comput. Secur. 29(1), 59–73 (2010)

    Article  Google Scholar 

  7. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  8. Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms. Springer Publishing Company, Incorporated, Berlin (2008)

    Book  Google Scholar 

  9. Torra, V.: Towards knowledge intensive data privacy. In: Proceedings of the 5th International Workshop on Data Privacy Management, and 3rd International Conference on Autonomous Spontaneous Security, Springer, Athens, Greece (2011)

    Google Scholar 

  10. Guarino, N.: Formal, ontology and information systems. In: 1st International Conference on Formal Ontology in Information Systems. IOS Press, Trento, Italy (1998)

    Google Scholar 

  11. Gomez-Perez, A., Fernandez-Lopez, M., Corcho, O.: Ontological Engineering, 2nd Printing. Springer, New York (2004)

    Google Scholar 

  12. Ding, L. et al.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. ACM, Washington, D.C., USA (2004)

    Google Scholar 

  13. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)

    Google Scholar 

  14. Martínez, S., et al.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13(4), 304–314 (2011)

    Article  Google Scholar 

  15. Martínez, S.: Ontology based semantic anonimisation of microdata. Universitat Rovira i Virgili. PhD. Thesis (2013). http://www.tdx.cat/bitstream/handle/10803/108961/Tesi.pdf?sequence=1

  16. Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inform. 46(2), 294–303 (2013)

    Article  Google Scholar 

  17. Rada, R., et al.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19(1), 17–30 (1989)

    Article  Google Scholar 

  18. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  19. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nations Econ. Comm. Eur. 18(4), 345–353 (2001)

    Google Scholar 

  20. Hundepool, A. et al.: \(\mu \)-ARGUS version 3.2 software and user’s manual. Statistics Netherlands, Voorburg NL (2003). http://neon.vb.cbs.nl/casc://neon.vb.cbs.nl/casc

  21. Domingo-Ferrer, J., et al.: Efficient multivariate data-oriented microaggregation. VLDB J. 15(4), 355–369 (2006)

    Article  Google Scholar 

  22. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Google Scholar 

  23. Torra, V.: Microaggregation for categorical variables: a median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.) Privacy in Statistical Databases, pp. 518–518. Springer, Berlin (2004)

    Google Scholar 

  24. Abril, D., Navarro-Arribas, G., Torra, V: Towards semantic microaggregation of categorical data for confidential documents. In: Proceedings of the 7th International Conference on Modeling Decisions for Artificial Intelligence. Springer, Perpignan, France (2010)

    Google Scholar 

  25. Martínez, S., Valls, A., Sánchez, D.: Semantically-grounded construction of centroids for datasets with textual attributes. Knowl. Based Syst. 35, 160–172 (2012)

    Article  Google Scholar 

  26. Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31(5), 653–672 (2012)

    Article  Google Scholar 

  27. Heer, G.R.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: International Seminar on Statistical Confidentiality. Eurostat, Luxembourg (1993)

    Google Scholar 

  28. Herranz, J., Nin, J., Torra, V.: Distributed privacy-preserving methods for statistical disclosure control data privacy management and autonomous spontaneous security. Int. Sci. 5939, 33–47 (2010)

    Google Scholar 

  29. Karr, A.F., et al.: A framework for evaluating the utility of data altered to protect confidentiality. Am. Stat. 60, 224–232 (2006)

    Article  MathSciNet  Google Scholar 

  30. Martínez, S., Sánchez, D., Valls, A.: Towards k-anonymous non-numerical data via semantic resampling. In: Greco, S. et al. (eds.) Information Processing and Management of Uncertainty in Knowledge-Based Systems, Catania, Italy (2012)

    Google Scholar 

  31. Elliot, M., Purdam, K., Smith, D.: Statistical disclosure control architectures for patient records in biomedical information systems. J. Biomed. Inform. 41(1), 58–64 (2008)

    Article  Google Scholar 

  32. Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37(3), 179–192 (2004)

    Article  Google Scholar 

  33. Spackman, K.A., Campbell, K.E., Cote, R.A.: SNOMED RT: a reference terminology for health care. In: Proceedings of AMIA Annual Fall Symposium, pp. 640–644 (1997)

    Google Scholar 

  34. Nelson, S.J., Johnston, D., Humphreys, B.L.: Relationships in medical subject headings. In: Relationships in the Organization of Knowledge, pp. 171–184. K.A. Publishers, New York (2001)

    Google Scholar 

  35. Martínez, S., Sánchez, D., Valls, A.: Evaluation of the disclosure risk of masking methods dealing with textual attributes. Int. J. Innovative Comput. Inf. Control 8(7(A)), 4869–4882 (2012)

    Google Scholar 

  36. Dwork, C.: Differential privacy. In: ICALP, Springer (2006)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the Spanish Ministry of Science and Innovation (through projects ICWT TIN2012-32757, ARES-CONSOLIDER INGENIO 2010 CSD2007-00004 and BallotNext IPT-2012-0603-430000) and by the Government of Catalonia under grants 2009 SGR 1135 and 2009 SGR-01523. Dr. Martínez was supported with research grants by the Universitat Rovira i Virgili and Ministerio de Educación y Ciencia (Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Martínez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Martínez, S., Valls, A., Sánchez, D. (2015). Semantic Anonymisation of Categorical Datasets. In: Navarro-Arribas, G., Torra, V. (eds) Advanced Research in Data Privacy. Studies in Computational Intelligence, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-09885-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09885-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09884-5

  • Online ISBN: 978-3-319-09885-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics