Abstract
Given a set of points, the microaggregation problem aims to find a clustering with a minimum sum of squared errors (SSE), where the cardinality of each cluster is greater than or equal to k. Points in the cluster are replaced by the cluster centroid, thus satisfying k-anonymity. Microaggregation is considered one of the most effective techniques for numerical microdata protection. Traditionally, non-optimal solutions to the microaggregation problem are obtained by heuristic approaches. Recently, the authors of this paper presented a mixed integer linear optimization (MILO) approach based on column generation for computing tight solutions and lower bounds to the microaggregation problem. However, MILO can be computationally expensive for large datasets. In this work we present a new heuristic that combines three blocks: (1) a decomposition of the dataset into subsets, (2) the MILO column generation algorithm applied to each dataset in order to obtain a valid microaggregation, and (3) a local search improvement algorithm to get the final clustering. Preliminary computational results show that this approach was able to provide (and even improve upon) some of the best solutions (i.e., of smallest SSE) reported in the literature for the Tarragona and Census datasets, and \(k\in \{3,5, 10\}\).
Supported by grant MCIU/AEI/FEDER RTI2018-097580-B-I00.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aloise, D., Hansen, P., Rocha, C., Santi, É.: Column generation bounds for numerical microaggregation. J. Global Optim. 60(2), 165–182 (2014). https://doi.org/10.1007/s10898-014-0149-3
Aloise, D., Araújo, A.: A derivative-free algorithm for refining numerical microaggregation solutions. Int. Trans. Oper. Res. 22, 693–712 (2015)
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J. M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002). http://neon.vb.cbs.nl/casc, https://research.cbs.nl/casc/CASCtestsets.html
Castro, J., Gentile, C., Spagnolo-Arrizabalaga, E.: An algorithm for the microaggregation problem using column generation. Comput. Oper. Res. 144, 105817 (2022). https://doi.org/10.1016/j.cor.2022.105817
Defays, D., Anwar, N.: Micro-aggregation: a generic method. In: Proceedings of Second International Symposium Statistical Confidentiality, pp. 69–78 (1995)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Mining Knowl. Disc. 11, 195–212 (2005)
Ji, X., Mitchell, J.E.: Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement. Discr. Optim. 4, 87–102 (2007)
Ghosh, J., Liu, A.: K-means. In: The Top Ten Algorithms in Data Mining, pp. 21–35. Taylor & Francis, Boca Raton (2009)
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15, 1043–1044 (2003)
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (Program PAM). In: Wiley Series in Probability and Statistics, pp. 68–125. John Wiley & Sons, Hoboken (1990)
Khomnotai, L., Lin, J.-L., Peng, Z.-Q., Samanta, A.: Iterative group decomposition for refining microaggregation solutions. Symmetry 10, 262 (2018). https://doi.org/10.3390/sym10070262
Maya-López, A., Casino, F., Solanas, A.: Improving multivariate microaggregation through Hamiltonian paths and optimal univariate microaggregation. Symmetry. 13, 916 (2021). https://doi.org/10.3390/sym13060916
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statist. J. U. N. Econ. Com. Eur. 18, 345–354 (2001)
Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25, 1191–1195 (2013)
Soria-Comas, J., Domingo-Ferrer, J., Mulero, R.: Efficient near-optimal variable-size microaggregation. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds.) MDAI 2019. LNCS (LNAI), vol. 11676, pp. 333–345. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26773-5_29
Spagnolo-Arrizabalaga, E.: On the use of Integer Programming to pursue Optimal Microaggregation. B.Sc. thesis, University Politècnica de Catalunya, School of Mathematics and Statistics, Barcelona (2016)
Solanas, A., Martínez-Ballesté, A.: V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings of COMPSTAT Symposium IASC, pp. 917–925 (2006)
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10, 557–570 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Castro, J., Gentile, C., Spagnolo-Arrizabalaga, E. (2022). An Optimization-Based Decomposition Heuristic for the Microaggregation Problem. In: Domingo-Ferrer, J., Laurent, M. (eds) Privacy in Statistical Databases. PSD 2022. Lecture Notes in Computer Science, vol 13463. Springer, Cham. https://doi.org/10.1007/978-3-031-13945-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-13945-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13944-4
Online ISBN: 978-3-031-13945-1
eBook Packages: Computer ScienceComputer Science (R0)