An Optimization-Based Decomposition Heuristic for the Microaggregation Problem

Castro, Jordi; Gentile, Claudio; Spagnolo-Arrizabalaga, Enric

doi:10.1007/978-3-031-13945-1_1

Jordi Castro⁹,
Claudio Gentile¹⁰ &
Enric Spagnolo-Arrizabalaga⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13463))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

Abstract

Given a set of points, the microaggregation problem aims to find a clustering with a minimum sum of squared errors (SSE), where the cardinality of each cluster is greater than or equal to k. Points in the cluster are replaced by the cluster centroid, thus satisfying k-anonymity. Microaggregation is considered one of the most effective techniques for numerical microdata protection. Traditionally, non-optimal solutions to the microaggregation problem are obtained by heuristic approaches. Recently, the authors of this paper presented a mixed integer linear optimization (MILO) approach based on column generation for computing tight solutions and lower bounds to the microaggregation problem. However, MILO can be computationally expensive for large datasets. In this work we present a new heuristic that combines three blocks: (1) a decomposition of the dataset into subsets, (2) the MILO column generation algorithm applied to each dataset in order to obtain a valid microaggregation, and (3) a local search improvement algorithm to get the final clustering. Preliminary computational results show that this approach was able to provide (and even improve upon) some of the best solutions (i.e., of smallest SSE) reported in the literature for the Tarragona and Census datasets, and $k\in \{3,5, 10\}$.

Supported by grant MCIU/AEI/FEDER RTI2018-097580-B-I00.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fast and Exact Algorithms for Some NP-Hard 2-Clustering Problems in the One-Dimensional Case

Clustering Binary Data by Application of Combinatorial Optimization Heuristics

Efficient Near-Optimal Variable-Size Microaggregation

References

Aloise, D., Hansen, P., Rocha, C., Santi, É.: Column generation bounds for numerical microaggregation. J. Global Optim. 60(2), 165–182 (2014). https://doi.org/10.1007/s10898-014-0149-3
Article MathSciNet MATH Google Scholar
Aloise, D., Araújo, A.: A derivative-free algorithm for refining numerical microaggregation solutions. Int. Trans. Oper. Res. 22, 693–712 (2015)
Article MathSciNet Google Scholar
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J. M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002). http://neon.vb.cbs.nl/casc, https://research.cbs.nl/casc/CASCtestsets.html
Castro, J., Gentile, C., Spagnolo-Arrizabalaga, E.: An algorithm for the microaggregation problem using column generation. Comput. Oper. Res. 144, 105817 (2022). https://doi.org/10.1016/j.cor.2022.105817
Article MathSciNet MATH Google Scholar
Defays, D., Anwar, N.: Micro-aggregation: a generic method. In: Proceedings of Second International Symposium Statistical Confidentiality, pp. 69–78 (1995)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous $k$-anonymity through microaggregation. Data Mining Knowl. Disc. 11, 195–212 (2005)
Article MathSciNet Google Scholar
Ji, X., Mitchell, J.E.: Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement. Discr. Optim. 4, 87–102 (2007)
Article MathSciNet Google Scholar
Ghosh, J., Liu, A.: K-means. In: The Top Ten Algorithms in Data Mining, pp. 21–35. Taylor & Francis, Boca Raton (2009)
Google Scholar
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15, 1043–1044 (2003)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (Program PAM). In: Wiley Series in Probability and Statistics, pp. 68–125. John Wiley & Sons, Hoboken (1990)
Google Scholar
Khomnotai, L., Lin, J.-L., Peng, Z.-Q., Samanta, A.: Iterative group decomposition for refining microaggregation solutions. Symmetry 10, 262 (2018). https://doi.org/10.3390/sym10070262
Article Google Scholar
Maya-López, A., Casino, F., Solanas, A.: Improving multivariate microaggregation through Hamiltonian paths and optimal univariate microaggregation. Symmetry. 13, 916 (2021). https://doi.org/10.3390/sym13060916
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statist. J. U. N. Econ. Com. Eur. 18, 345–354 (2001)
Google Scholar
Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25, 1191–1195 (2013)
Article Google Scholar
Soria-Comas, J., Domingo-Ferrer, J., Mulero, R.: Efficient near-optimal variable-size microaggregation. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds.) MDAI 2019. LNCS (LNAI), vol. 11676, pp. 333–345. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26773-5_29
Chapter Google Scholar
Spagnolo-Arrizabalaga, E.: On the use of Integer Programming to pursue Optimal Microaggregation. B.Sc. thesis, University Politècnica de Catalunya, School of Mathematics and Statistics, Barcelona (2016)
Google Scholar
Solanas, A., Martínez-Ballesté, A.: V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings of COMPSTAT Symposium IASC, pp. 917–925 (2006)
Google Scholar
Sweeney, L.: $k$-anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10, 557–570 (2002)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Operations Research, Universitat Politècnica de Catalunya, Jordi Girona 1–3, 08034, Barcelona, Catalonia
Jordi Castro & Enric Spagnolo-Arrizabalaga
Istituto di Analisi dei Sistemi ed Informatica “A. Ruberti”, Consiglio Nazionale delle Ricerche, Rome, Italy
Claudio Gentile

Authors

Jordi Castro
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Gentile
View author publications
You can also search for this author in PubMed Google Scholar
Enric Spagnolo-Arrizabalaga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Castro .

Editor information

Editors and Affiliations

Universitat Rovira i Virgili, Tarragona, Catalonia, Spain
Josep Domingo-Ferrer
Télécom SudParis, Palaiseau, France
Maryline Laurent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castro, J., Gentile, C., Spagnolo-Arrizabalaga, E. (2022). An Optimization-Based Decomposition Heuristic for the Microaggregation Problem. In: Domingo-Ferrer, J., Laurent, M. (eds) Privacy in Statistical Databases. PSD 2022. Lecture Notes in Computer Science, vol 13463. Springer, Cham. https://doi.org/10.1007/978-3-031-13945-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-13945-1_1
Published: 14 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13944-4
Online ISBN: 978-3-031-13945-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Optimization-Based Decomposition Heuristic for the Microaggregation Problem