Column generation bounds for numerical microaggregation

Aloise, Daniel; Hansen, Pierre; Rocha, Caroline; Santi, Éverton

doi:10.1007/s10898-014-0149-3

Column generation bounds for numerical microaggregation

Published: 18 February 2014

Volume 60, pages 165–182, (2014)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Daniel Aloise¹,
Pierre Hansen²,
Caroline Rocha¹ &
…
Éverton Santi¹

283 Accesses
6 Citations
Explore all metrics

Abstract

The biggest challenge when disclosing private data is to share information contained in databases while protecting people from being individually identified. Microaggregation is a family of methods for statistical disclosure control. The principle of microaggregation is that confidentiality rules permit the publication of individual records if they are partitioned into groups of size larger or equal to a fixed threshold value, where none is more representative than the others in the same group. The application of such rules leads to replacing individual values by those computed from small groups (microaggregates), before data publication. This work proposes a column generation algorithm for numerical microaggregation in which its pricing problem is solved by a specialized branch-and-bound. The algorithm is able to find, for the first time, lower bounds for instances of three real-world datasets commonly used in the literature. Furthermore, new best known solutions are obtained for these instances by means of a simple heuristic method with the columns generated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

References

Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu., A.: Approximation algorithms for \(k\)-anonymity. J. Privacy Tech. (2005).
Aloise, D., Hansen, P.: Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering. J. Glob. Optim. 49, 449–465 (2011)
Article Google Scholar
Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131, 195–220 (2012)
Article Google Scholar
Bonami, P., Lee, J.: BONMIN user’s manual. IBM Corporation, Tech. rep., New York (2007)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Chang, C.C., Li, Y.C., Huang, W.H.: TRFP: An efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11, 195–212 (2005)
Article Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Article Google Scholar
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55, 714–732 (2008)
Article Google Scholar
du Merle, O., Hansen, P., Jaumard, B., Mladenović, N.: An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)
Article Google Scholar
Elhallaoui, I., Villeneuve, D., Soumis, F., Desaulniers, G.: Dynamic aggregation of set-partitioning constraints in column generation. Oper. Res. 53, 632–645 (2005)
Article Google Scholar
Goffin, J.L., Haurie, A., Vial, J.-P.: Decomposition and nondifferentiable optimization with the projective algorithm. Manag. Sci. 38, 284–302 (1992)
Google Scholar
Grötschel, M., Wakabayashi, Y.: Facets of the clique partitioning polytope. Math. Program. 47, 367–387 (1990)
Article Google Scholar
Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001)
Article Google Scholar
Hansen, P., Mladenović, N., Pérez, J.: Variable neighborhood search. Methods Appl. 4OR6, 319–360 (2008)
Google Scholar
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15, 1043–1044 (2003)
Article Google Scholar
Heinz, G., Peterson, L., Johnson, R., Kerk, C.: Exploring relationships in body dimensions. J. Stat. Educ. 11. www.amstat.org/publications/jse/v11n2/datasets.heinz.html (2003)
Ji, X., Mitchell, J.E.: Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement. Discret. Optim. 4, 87–102 (2007)
Article Google Scholar
Kabir, E., Wang, H., Zhang, Y.: A pairwise-systematic microaggregation for statistical disclosure control. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 266–273 (2010)
Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. C–24, 908–915 (1975)
Article Google Scholar
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17, 902–911 (2005)
Article Google Scholar
Liberti, L.: Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1), 55–86 (2009)
Article Google Scholar
Lin, J.L., Hsieh, T.H., Chang, J.C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37, 3256–3263 (2010)
Article Google Scholar
Marsten, R., Hogan, W., Blankenship, J.: The boxstep method for large-scale optimization. Oper. Res. 23, 389–405 (1975)
Article Google Scholar
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nat. Econ. Com. Eur. 18, 345–354 (2001)
Google Scholar
Panagiotakis, C., Tziritas, G.: Sucessive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25, 1191–1195 (2012)
Google Scholar
Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70, 892–921 (2011)
Article Google Scholar
Rocha Neto, A., Barreto, G.: On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: A comparative analysis. IEEE Lat. Am. Trans. 7, 487–496 (2009)
Google Scholar
Ryan, D., Foster, B.: An integer programming approach to scheduling. In: A. Wren (ed.) Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, pp. 269–280. North-Holland (1981)
Solanas, A., Gavalda, A., Rallo, R.: Micro-som: a linear-time multivariate microaggregation algorithm based on self-organizing maps. LNCS 5768, 525–535 (2009)
Google Scholar
Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: V-MDAV: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC (2006)
Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A \(2^d\)-tree-based blocking method for microaggregating very large data sets. In: Proceedings of the First international conference on availability, reliability and security (2006)
Sun, X., Wang, H., Li, J., Zhang, Y.: An approximate microaggregation approach for microdata protection. Expert Syst. Appl. 39, 2211–2219 (2012)
Article Google Scholar
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Syst 10, 557–570 (2002)
Article Google Scholar
Willenborg, L., DeWaal, T.: Elements of statistical disclosure control. Springer, New York (2001)
Book Google Scholar
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101–1113 (1993)
Article Google Scholar

Download references

Acknowledgments

Research of the first author has been supported by the National Council for Scientific and Technological Development—CNPq/Brazil Grant Numbers 474231/2010-0 and 305070/2011-8. The authors also thank Prof. Costas Panagiotakis for providing the Tarragona, Census and Eia datasets.

Author information

Authors and Affiliations

Universidade Federal do Rio Grande do Norte, Campus Universitário s/n, Natal, RN, 59072-970, Brazil
Daniel Aloise, Caroline Rocha & Éverton Santi
GERAD and HEC Montréal, 3000, Chemin de la Côte-Sainte-Catherine, Montreal, QC, H3T 2A7, Canada
Pierre Hansen

Authors

Daniel Aloise
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Rocha
View author publications
You can also search for this author in PubMed Google Scholar
Éverton Santi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Aloise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aloise, D., Hansen, P., Rocha, C. et al. Column generation bounds for numerical microaggregation. J Glob Optim 60, 165–182 (2014). https://doi.org/10.1007/s10898-014-0149-3

Download citation

Received: 13 February 2013
Accepted: 14 January 2014
Published: 18 February 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10898-014-0149-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Column generation bounds for numerical microaggregation

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Guide for Sparse PCA: Model Comparison and Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Column generation bounds for numerical microaggregation

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Guide for Sparse PCA: Model Comparison and Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation