Improving Time Complexity and Utility of k-anonymous Microaggregation

Thaeter, Florian; Reischuk, Rüdiger

doi:10.1007/978-3-031-36840-0_10

Florian Thaeter⁹ &
Rüdiger Reischuk⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1795))

Included in the following conference series:

International Conference on E-Business and Telecommunications

121 Accesses

Abstract

For research in medicine, economics and social sciences specific data of individuals is needed. Thus it should be publicly available, but this should not offend the privacy of each individual. Microaggregation applied to databases is a standard technique to protect privacy. It clusters similar people in larger groups to achieve so called k-anonymity – every individual is hidden in a cluster of size at least k. Then the data can be made public for all kinds of analysis, whereas other concepts like differential privacy keep the database secret and allow only specific questions about the data to be asked by outsiders.

The modification of a database to achieve anonymity should be as small as possible to keep its utility – that means the loss of information should be minimized. In this respect microaggregation typically performs much better than other anonymization techniques like generalization or suppression. However, minimizing the information loss by k-anonymous microaggregation is an NP-hard optimization problem for \(k \ge 3\). Not only computing optimal solutions efficiently is unlikely, nontrivial approximations are lacking, too. Therefore, a bunch of heuristics all with at least quadratic time complexity have been developed.

This paper improves microaggregation significantly and provides a tradeoff between computational effort and utility. First, we make a detailed analysis and tuning of the maximum distance methodology – the common approach to generate a clustering that provides k-anonymity. We review the methods proposed so far and design a new algorithm \(\texttt{MDAV}^{*}_\gamma \) that gives better utility on standard benchmarks.

A different approach of quadratic time complexity based on Lloyd’s algorithm has been proposed and named ONA, but not completely analysed. This paper fills this gap and improves several steps resulting in a new algorithm \(\texttt{ONA}^{*}\) with better utility.

Mondrian is a another approach for clustering data that can be adopted for microaggregation. It is quite fast, but typically achieves very pure utility. We improve on this and design an almost linear time algorithm that gives acceptable utility, however worse than the quadratic time algorithms.

Finally, we combine both techniques, ONA and Mondrian, to construct a new class of parameterized algorithms called \(\texttt{MONA}\). They are quite fast with time complexity between almost linear and quadratic, and deliver competitive utility compared to the MDAV approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anwar, N.: Micro-aggregation-the small aggregates method. Technical report, Internal report. Luxembourg: Eurostat (1993)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Google Scholar
Defays, D., Anwar, M.N.: Masking microdata using micro-aggregation. J. Offic. Stat. 14(4), 449 (1998)
Google Scholar
Defays, D., Nanopoulos, Ph.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204 (1993)
Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. Int. J. Very Large Data Bases 15(4), 355–369 (2006)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata (2002). https://web.archive.org/web/20190412063606/http://neon.vb.cbs.nl/casc/CASCtestsets.htm
Josep Domingo-Ferrer and Josep Maria Mateo-Sanz: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 25–25. IEEE (2006)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007)
Google Scholar
Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp. 32–33. ACM (2012)
Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discovery Data (TKDD) 1(1), 3 (2007)
Article Google Scholar
Sanz, J.M.M., Ferrer, J.D.: A comparative study of microaggregation methods. Qüestiió 22(3) (1998)
Google Scholar
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. N. Econ. Comm. Eur. 18(4), 345–353 (2001)
Google Scholar
Rebollo-Monedero, D., Forné, J., Pallarès, E., Parra-Arnau, J.: A modification of the lloyd algorithm for k-anonymous quantization. Inf. Sci. 222, 185–202 (2013)
Article MathSciNet MATH Google Scholar
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: V-mdav: a multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome, pp. 917–925 (2006)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J., Mulero, R.: Efficient near-optimal variable-size microaggregation. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds.) MDAI 2019. LNCS (LNAI), vol. 11676, pp. 333–345. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26773-5_29
Chapter Google Scholar
Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data - anonymisation groundhog day. In: USENIX 2022, to appear
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Internat. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Thaeter, F.: k-anonymous microaggregation. Dissertation, Universität zu Lübeck (2021)
Google Scholar
Thaeter, F., Reischuk, R.: Improving anonymization clustering. In: Langweg, H., Meier, M., Witt, B.C., Reinhardt, D. (eds.) SICHERHEIT 2018, pp. 69–82, Bonn (2018). Gesellschaft für Informatik e.V
Google Scholar
Thaeter, F., Reischuk, R.: Hardness of k-anonymous microaggregation. Discret. Appl. Math. 303, 149–158 (2021)
Article MathSciNet MATH Google Scholar
Thaeter, F., Reischuk, R.: Scalable k-anonymous microaggregation: Exploiting the tradeoff between computational complexity and information loss. In: 18th International Conference on Security and Cryptography (SECRYPT), pp. 87–98 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Theoretische Informatik, Universität zu Lübeck, Lübeck, Germany
Florian Thaeter & Rüdiger Reischuk

Authors

Florian Thaeter
View author publications
You can also search for this author in PubMed Google Scholar
Rüdiger Reischuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rüdiger Reischuk .

Editor information

Editors and Affiliations

Università degli Studi di Milano, Milan, Italy
Pierangela Samarati
University of Twente, Enschede, The Netherlands
Marten van Sinderen
Università degli Studi di Milano, Milan, Italy
Sabrina De Capitani di Vimercati
University of Twente, Enschede, The Netherlands
Fons Wijnhoven

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thaeter, F., Reischuk, R. (2023). Improving Time Complexity and Utility of k-anonymous Microaggregation. In: Samarati, P., van Sinderen, M., Vimercati, S.D.C.d., Wijnhoven, F. (eds) E-Business and Telecommunications. ICETE 2021. Communications in Computer and Information Science, vol 1795. Springer, Cham. https://doi.org/10.1007/978-3-031-36840-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-36840-0_10
Published: 22 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36839-4
Online ISBN: 978-3-031-36840-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Time Complexity and Utility of k-anonymous Microaggregation