Anonymization of Data Sets with NULL Values

Ciglic, Margareta; Eder, Johann; Koncilia, Christian

doi:10.1007/978-3-662-49214-7_7

Margareta Ciglic¹⁹,
Johann Eder¹⁹ &
Christian Koncilia¹⁹

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9510))

647 Accesses
4 Citations

Abstract

Releasing, publishing or transferring microdata is restricted by the necessity to protect the privacy of data owners. k-anonymity is one of the most widespread concepts for anonymizing microdata but it does not explicitly cover NULL values which are nevertheless frequently found in microdata. We study the problem of NULL values (missing values, non-applicable attributes, etc.) for anonymization in detail, present a set of new definitions for k-anonymity explicitly considering NULL values and analyze which definition protects from which attacks. We show that an adequate treatment of missing values in microdata can be easily achieved by an extension of generalization algorithms. In particular, we show how the proposed treatment of NULL values was incorporated in the anonymization tool ANON, which implements generalization and tuple suppression with an application specific definition of information loss. With a series of experiments we show that NULL aware generalization algorithms have less information loss than standard algorithms.

The work was supported by Austrian Ministry of Science and Research within the project BBMRI.AT (GZ 10.470/0016-II/3/2013) and Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF) within the project ANON.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for \(k\)-anonymity. In: Proceedings of the International Conference on Database Theory, ICDT 2005 (2005)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal \(k\)-anonymization. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 217–228 (2005)
Google Scholar
Ciglic, M., Eder, J., Koncilia, C.: ANON - a flexible tool for achieving optimal \(k\)-anonymous and \(\ell \)-diverse tables. Technical report, University of Klagenfurt (2014). http://isys.uni-klu.ac.at/PDF/2014-ANON-Techreport.pdf
Ciglic, M., Eder, J., Koncilia, C.: \(k\)-anonymity of microdata with NULL values. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014, Part I. LNCS, vol. 8644, pp. 328–342. Springer, Heidelberg (2014)
Google Scholar
Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: \(k\)-anonymity. In: Yu, T., Jajodia, S. (eds.) SDMDS 2007. AISC, vol. 33, pp. 323–353. Springer, New York (2007)
Chapter Google Scholar
Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)
Article Google Scholar
Cox, L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75(370), 377–385 (1980)
Article MATH Google Scholar
Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009)
Chapter Google Scholar
Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Public Health Genomics 15(5), 254–262 (2012)
Article Google Scholar
Eder, J., Stark, K., Asslaber, M., Abuja, P.M., Gottweis, H., Trauner, M., Mischinger, H.J., Schippinger, W., Berghold, A., Denk, H., Zatloukal, K.: The genome austria tissue bank. Pathobiology 74(4), 251–8 (2007)
Article Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
Fung, B.C.M., Wang, K., Fu, A.W.-C., Yu, P.S.: Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques, 1st edn. Chapman & Hall/CRC, Boca Raton (2010)
Book Google Scholar
Gaskell, G., Gottweis, H., Starkbaum, J., Gerber, M.M., Broerse, J., Gottweis, U., Hobbs, A., Ilpo, H., Paschou, M., Snell, K., Soulier, A.: Publics and biobanks: Pan-European diversity and the challenge of responsible innovation. Eur. J. Hum. Genet. 21(1), 14–20 (2013)
Article Google Scholar
ISO: ISO/IEC 9075–2:2011 Information technology – Database languages – SQL – Part 2: Foundation (SQL/Foundation), December 2011
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288 (2002)
Google Scholar
Kifer, D., Gehrk, J.: Injecting utility into anonymized datasets. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, SIGMOD 2006, pp. 217–228 (2006)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain \(k\)-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of data, SIGMOD 2005, pp. 49–60 (2005)
Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 3 (2007)
Article Google Scholar
Matthews, G.J., Harel, O.: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Stat. Surv. 5, 1–29 (2011)
Article MATH MathSciNet Google Scholar
Meyden, R.: Logical approaches to incomplete information: a survey. In: Chomicki, J., Saake, G. (eds.) Logics for Databases and Information Systems. The Springer International Series in Engineering and Computer Science, vol. 436, pp. 307–357. Springer, New York (1998). Chapter 10
Chapter Google Scholar
Meyerson, A., Williams, R:. On the complexity of optimal \(k\)-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 223–228 (2004)
Google Scholar
Ohrn, A., Ohno-Machado, L.: Using boolean reasoning to anonymize databases. Artif. Intell. Med. 15(3), 235–254 (1999)
Article Google Scholar
Park, H., Shim, K.: Approximate algorithms for \(k\)-anonymity. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of data, SIGMOD 2007, pp. 67–78 (2007)
Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. (TKDE) 13(6), 1010–1027 (2001)
Article Google Scholar
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1998, p. 188 (1998)
Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical report (1998)
Google Scholar
Stark, K., Eder, J., Zatloukal, K.: Priority-based \(k\)-anonymity accomplished by weighted generalisation structures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 394–404. Springer, Heidelberg (2006)
Chapter Google Scholar
Stark, K., Eder, J., Zatloukal, K.: Achieving \(k\)-anonymity in datamarts used for gene expressions exploitation. J. Integr. Bioinform. 4(1), 57 (2007)
Google Scholar
Sun, X., Wang, H., Li, J., Truta, T.M.: Enhanced p-sensitive \(k\)-anonymity models for privacy preserving data publishing. Trans. Data Priv. 1(2), 53–66 (2008)
MathSciNet Google Scholar
Sweeney, L.: Achieving \(k\)-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002)
Article MATH MathSciNet Google Scholar
Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)
Article Google Scholar
Tian, H., Zhang, W.: Extending l-diversity to generalize sensitive data. Data Knowl. Eng. 70(1), 101–126 (2011)
Article Google Scholar
Wichmann, H.-E.E., Kuhn, K.A., Waldenberger, M., Schmelcher, D., Schuffenhauer, S., Meitinger, T., Wurst, S.H., Lamla, G., Fortier, I., Burton, P.R., Peltonen, L., Perola, M., Metspalu, A., Riegman, P., Landegren, U., Taussig, M.J., Litton, J.-E.E., Fransson, M.N., Eder, J., Cambon-Thomsen, A., Bovenberg, J., Dagher, G., van Ommen, G.-J.J., Griffith, M., Yuille, M., Zatloukal, K.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics Systems, Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Margareta Ciglic, Johann Eder & Christian Koncilia

Authors

Margareta Ciglic
View author publications
You can also search for this author in PubMed Google Scholar
Johann Eder
View author publications
You can also search for this author in PubMed Google Scholar
Christian Koncilia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johann Eder .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker
Czech Technical University, Prague, Czech Republic
Lenka Lhotska
University of Auckland, Auckland, New Zealand
Sebastian Link

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ciglic, M., Eder, J., Koncilia, C. (2016). Anonymization of Data Sets with NULL Values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. Lecture Notes in Computer Science(), vol 9510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49214-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-662-49214-7_7
Published: 07 January 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49213-0
Online ISBN: 978-3-662-49214-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics