Skip to main content

Improving Time Complexity and Utility of k-anonymous Microaggregation

  • Conference paper
  • First Online:
E-Business and Telecommunications (ICETE 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1795))

Included in the following conference series:

  • 121 Accesses

Abstract

For research in medicine, economics and social sciences specific data of individuals is needed. Thus it should be publicly available, but this should not offend the privacy of each individual. Microaggregation applied to databases is a standard technique to protect privacy. It clusters similar people in larger groups to achieve so called k-anonymity – every individual is hidden in a cluster of size at least k. Then the data can be made public for all kinds of analysis, whereas other concepts like differential privacy keep the database secret and allow only specific questions about the data to be asked by outsiders.

The modification of a database to achieve anonymity should be as small as possible to keep its utility – that means the loss of information should be minimized. In this respect microaggregation typically performs much better than other anonymization techniques like generalization or suppression. However, minimizing the information loss by k-anonymous microaggregation is an NP-hard optimization problem for \(k \ge 3\). Not only computing optimal solutions efficiently is unlikely, nontrivial approximations are lacking, too. Therefore, a bunch of heuristics all with at least quadratic time complexity have been developed.

This paper improves microaggregation significantly and provides a tradeoff between computational effort and utility. First, we make a detailed analysis and tuning of the maximum distance methodology – the common approach to generate a clustering that provides k-anonymity. We review the methods proposed so far and design a new algorithm \(\texttt{MDAV}^{*}_\gamma \) that gives better utility on standard benchmarks.

A different approach of quadratic time complexity based on Lloyd’s algorithm has been proposed and named ONA, but not completely analysed. This paper fills this gap and improves several steps resulting in a new algorithm \(\texttt{ONA}^{*}\) with better utility.

Mondrian is a another approach for clustering data that can be adopted for microaggregation. It is quite fast, but typically achieves very pure utility. We improve on this and design an almost linear time algorithm that gives acceptable utility, however worse than the quadratic time algorithms.

Finally, we combine both techniques, ONA and Mondrian, to construct a new class of parameterized algorithms called \(\texttt{MONA}\). They are quite fast with time complexity between almost linear and quadratic, and deliver competitive utility compared to the MDAV approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anwar, N.: Micro-aggregation-the small aggregates method. Technical report, Internal report. Luxembourg: Eurostat (1993)

    Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  3. Defays, D., Anwar, M.N.: Masking microdata using micro-aggregation. J. Offic. Stat. 14(4), 449 (1998)

    Google Scholar 

  4. Defays, D., Nanopoulos, Ph.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204 (1993)

    Google Scholar 

  5. Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. Int. J. Very Large Data Bases 15(4), 355–369 (2006)

    Google Scholar 

  6. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata (2002). https://web.archive.org/web/20190412063606/http://neon.vb.cbs.nl/casc/CASCtestsets.htm

  7. Josep Domingo-Ferrer and Josep Maria Mateo-Sanz: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Article  Google Scholar 

  8. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  9. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  10. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 25–25. IEEE (2006)

    Google Scholar 

  11. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007)

    Google Scholar 

  12. Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp. 32–33. ACM (2012)

    Google Scholar 

  13. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  14. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  15. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discovery Data (TKDD) 1(1), 3 (2007)

    Article  Google Scholar 

  16. Sanz, J.M.M., Ferrer, J.D.: A comparative study of microaggregation methods. Qüestiió 22(3) (1998)

    Google Scholar 

  17. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. N. Econ. Comm. Eur. 18(4), 345–353 (2001)

    Google Scholar 

  18. Rebollo-Monedero, D., Forné, J., Pallarès, E., Parra-Arnau, J.: A modification of the lloyd algorithm for k-anonymous quantization. Inf. Sci. 222, 185–202 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  20. Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: V-mdav: a multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome, pp. 917–925 (2006)

    Google Scholar 

  21. Soria-Comas, J., Domingo-Ferrer, J., Mulero, R.: Efficient near-optimal variable-size microaggregation. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds.) MDAI 2019. LNCS (LNAI), vol. 11676, pp. 333–345. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26773-5_29

    Chapter  Google Scholar 

  22. Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data - anonymisation groundhog day. In: USENIX 2022, to appear

    Google Scholar 

  23. Sweeney, L.: k-anonymity: a model for protecting privacy. Internat. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Thaeter, F.: k-anonymous microaggregation. Dissertation, Universität zu Lübeck (2021)

    Google Scholar 

  25. Thaeter, F., Reischuk, R.: Improving anonymization clustering. In: Langweg, H., Meier, M., Witt, B.C., Reinhardt, D. (eds.) SICHERHEIT 2018, pp. 69–82, Bonn (2018). Gesellschaft für Informatik e.V

    Google Scholar 

  26. Thaeter, F., Reischuk, R.: Hardness of k-anonymous microaggregation. Discret. Appl. Math. 303, 149–158 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  27. Thaeter, F., Reischuk, R.: Scalable k-anonymous microaggregation: Exploiting the tradeoff between computational complexity and information loss. In: 18th International Conference on Security and Cryptography (SECRYPT), pp. 87–98 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rüdiger Reischuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thaeter, F., Reischuk, R. (2023). Improving Time Complexity and Utility of k-anonymous Microaggregation. In: Samarati, P., van Sinderen, M., Vimercati, S.D.C.d., Wijnhoven, F. (eds) E-Business and Telecommunications. ICETE 2021. Communications in Computer and Information Science, vol 1795. Springer, Cham. https://doi.org/10.1007/978-3-031-36840-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36840-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36839-4

  • Online ISBN: 978-3-031-36840-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics