Towards k-Anonymous Non-numerical Data via Semantic Resampling

Martínez, Sergio; Sánchez, David; Valls, Aïda

doi:10.1007/978-3-642-31724-8_54

Sergio Martínez⁶,
David Sánchez⁶ &
Aïda Valls⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 300))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1276 Accesses
7 Citations

Abstract

Privacy should be carefully considered during the publication of data (e.g. database records) collected from individuals to avoid disclosing identities or revealing confidential information. Anonymisation methods aim at achieving a certain degree of privacy by performing transformations over non-anonymous data while minimising, as much as possible, the distortion (i.e. information loss) derived from these transformations. k-anonymity is a property typically considered when masking data, stating that each record (corresponding to an individual) is indistinguishable from at least k-1 other records in the anonymised dataset. Many methods have been developed to anonymise data, but most of them are focused solely on numerical attributes. Non-numerical values (e.g. categorical attributes like job or country-of-birth or unbounded textual ones like user preferences) are more challenging because arithmetic operations cannot be applied. To properly manage and interpret this kind of data, it is required to have operators that are able to deal with data semantics. In this paper, we propose an anonymisation method based on a classic data re-sampling algorithm that guarantees the fulfilment of the k-anonymity property and is able to deal with non-numerical data from a semantic perspective. Our method has been applied to anonymise the well-known Adult Census dataset, showing that a semantic interpretation of non-numerical values better minimises the information loss of the masked data file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Google Scholar
Domingo-Ferrer, J.: A Survey of Inference Control Methods for Privacy-Preserving Data Mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, vol. 34, pp. 53–80. Springer US (2008)
Google Scholar
Heer, G.R.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Int. Seminar on Statistical Confidentiality, Eurostat, pp. 261–271 (1993)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical Data-Oriented Microaggregation for Statistical Disclosure Control. IEEE Trans. on Knowl. and Data Eng. 14, 189–201 (2002)
Article Google Scholar
Herranz, J., Nin, J., Torra, V.: Distributed Privacy-Preserving Methods for Statistical Disclosure Control. In: Garcia-Alfaro, J., Navarro-Arribas, G., Cuppens-Boulahia, N., Roudier, Y. (eds.) DPM 2009. LNCS, vol. 5939, pp. 33–47. Springer, Heidelberg (2010)
Chapter Google Scholar
Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality. The American Statistician 60, 224–232 (2006)
Article MathSciNet Google Scholar
Torra, V.: Towards knowledge intensive data privacy. In: Proceedings of the 5th International Workshop on Data Privacy Management, and 3rd International Conference on Autonomous Spontaneous Security, pp. 1–7. Springer, Athens (2011)
Chapter Google Scholar
Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Information Fusion. Special Issue on Privacy and Security 13, 304–314 (2012)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: The 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics, Las Cruces (1994)
Chapter Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Resampling for statistical confidentiality in contingency tables. Computers & Mathematics with Applications 38, 13–32 (1999)
Article MathSciNet MATH Google Scholar
Jones, D.H., Adam, N.R.: Disclosure avoidance using the bootstrap and other resampling schemes. In: Proceedings of the Fifth Annual Research Conference, U.S. Bureau of the Census, pp. 446–455 (1989)
Google Scholar
Abril, D., Navarro-Arribas, G., Torra, V.: Towards Semantic Microaggregation of Categorical Data for Confidential Documents. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)
Chapter Google Scholar
Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Chapter Google Scholar
Martínez, S., Sánchez, D., Valls, A.: Ontology-Based Anonymization of Categorical Values. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 243–254. Springer, Heidelberg (2010)
Chapter Google Scholar
Martínez, S., Sánchez, D., Valls, A., Batet, M.: The Role of Ontologies in the Anonymization of Textual Variables. In: Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence, pp. 153–162. IOS Press (2010)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (1998)
Google Scholar
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 652–659. ACM, Washington, D.C. (2004)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archive (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007, Tarragona, Catalonia, Spain
Sergio Martínez, David Sánchez & Aïda Valls

Authors

Sergio Martínez
View author publications
You can also search for this author in PubMed Google Scholar
David Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Aïda Valls
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Catania, Corso Italia, 55, 95129, Catania, Italy
Salvatore Greco & Benedetto Matarazzo &
CNRS UMR 7606, DAPA, LIP6 8, Université Pierre et Marie Curie - Paris6, rue du Capitaine Scott, F-75015, Paris, France
Bernadette Bouchon-Meunier
Dip. Matematica e Informatica, Università di Perugia, 06123, Perugia, Italy
Giulianella Coletti
Department of Computer and Management Science, University of Trento, Via Inama 5, 38122, Trento, Italy
Mario Fedrizzi
Machine Intelligence Institute — IONA College,, 10801, New Rochelle, NY, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez, S., Sánchez, D., Valls, A. (2012). Towards k-Anonymous Non-numerical Data via Semantic Resampling. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds) Advances in Computational Intelligence. IPMU 2012. Communications in Computer and Information Science, vol 300. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31724-8_54

Download citation

DOI: https://doi.org/10.1007/978-3-642-31724-8_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31723-1
Online ISBN: 978-3-642-31724-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics