Skip to main content
Log in

Column generation bounds for numerical microaggregation

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The biggest challenge when disclosing private data is to share information contained in databases while protecting people from being individually identified. Microaggregation is a family of methods for statistical disclosure control. The principle of microaggregation is that confidentiality rules permit the publication of individual records if they are partitioned into groups of size larger or equal to a fixed threshold value, where none is more representative than the others in the same group. The application of such rules leads to replacing individual values by those computed from small groups (microaggregates), before data publication. This work proposes a column generation algorithm for numerical microaggregation in which its pricing problem is solved by a specialized branch-and-bound. The algorithm is able to find, for the first time, lower bounds for instances of three real-world datasets commonly used in the literature. Furthermore, new best known solutions are obtained for these instances by means of a simple heuristic method with the columns generated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu., A.: Approximation algorithms for \(k\)-anonymity. J. Privacy Tech. (2005).

  2. Aloise, D., Hansen, P.: Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering. J. Glob. Optim. 49, 449–465 (2011)

    Article  Google Scholar 

  3. Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131, 195–220 (2012)

    Article  Google Scholar 

  4. Bonami, P., Lee, J.: BONMIN user’s manual. IBM Corporation, Tech. rep., New York (2007)

    Google Scholar 

  5. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  6. Chang, C.C., Li, Y.C., Huang, W.H.: TRFP: An efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)

    Article  Google Scholar 

  7. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)

    Article  Google Scholar 

  8. Domingo-Ferrer, J., Torra, V.: Ordinal continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11, 195–212 (2005)

    Article  Google Scholar 

  9. Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)

    Article  Google Scholar 

  10. Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55, 714–732 (2008)

    Article  Google Scholar 

  11. du Merle, O., Hansen, P., Jaumard, B., Mladenović, N.: An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)

    Article  Google Scholar 

  12. Elhallaoui, I., Villeneuve, D., Soumis, F., Desaulniers, G.: Dynamic aggregation of set-partitioning constraints in column generation. Oper. Res. 53, 632–645 (2005)

    Article  Google Scholar 

  13. Goffin, J.L., Haurie, A., Vial, J.-P.: Decomposition and nondifferentiable optimization with the projective algorithm. Manag. Sci. 38, 284–302 (1992)

    Google Scholar 

  14. Grötschel, M., Wakabayashi, Y.: Facets of the clique partitioning polytope. Math. Program. 47, 367–387 (1990)

    Article  Google Scholar 

  15. Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001)

    Article  Google Scholar 

  16. Hansen, P., Mladenović, N., Pérez, J.: Variable neighborhood search. Methods Appl. 4OR6, 319–360 (2008)

    Google Scholar 

  17. Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15, 1043–1044 (2003)

    Article  Google Scholar 

  18. Heinz, G., Peterson, L., Johnson, R., Kerk, C.: Exploring relationships in body dimensions. J. Stat. Educ. 11. www.amstat.org/publications/jse/v11n2/datasets.heinz.html (2003)

  19. Ji, X., Mitchell, J.E.: Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement. Discret. Optim. 4, 87–102 (2007)

    Article  Google Scholar 

  20. Kabir, E., Wang, H., Zhang, Y.: A pairwise-systematic microaggregation for statistical disclosure control. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 266–273 (2010)

  21. Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. C–24, 908–915 (1975)

    Article  Google Scholar 

  22. Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17, 902–911 (2005)

    Article  Google Scholar 

  23. Liberti, L.: Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1), 55–86 (2009)

    Article  Google Scholar 

  24. Lin, J.L., Hsieh, T.H., Chang, J.C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37, 3256–3263 (2010)

    Article  Google Scholar 

  25. Marsten, R., Hogan, W., Blankenship, J.: The boxstep method for large-scale optimization. Oper. Res. 23, 389–405 (1975)

    Article  Google Scholar 

  26. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nat. Econ. Com. Eur. 18, 345–354 (2001)

    Google Scholar 

  27. Panagiotakis, C., Tziritas, G.: Sucessive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25, 1191–1195 (2012)

    Google Scholar 

  28. Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70, 892–921 (2011)

    Article  Google Scholar 

  29. Rocha Neto, A., Barreto, G.: On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: A comparative analysis. IEEE Lat. Am. Trans. 7, 487–496 (2009)

    Google Scholar 

  30. Ryan, D., Foster, B.: An integer programming approach to scheduling. In: A. Wren (ed.) Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, pp. 269–280. North-Holland (1981)

  31. Solanas, A., Gavalda, A., Rallo, R.: Micro-som: a linear-time multivariate microaggregation algorithm based on self-organizing maps. LNCS 5768, 525–535 (2009)

    Google Scholar 

  32. Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: V-MDAV: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC (2006)

  33. Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A \(2^d\)-tree-based blocking method for microaggregating very large data sets. In: Proceedings of the First international conference on availability, reliability and security (2006)

  34. Sun, X., Wang, H., Li, J., Zhang, Y.: An approximate microaggregation approach for microdata protection. Expert Syst. Appl. 39, 2211–2219 (2012)

    Article  Google Scholar 

  35. Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Syst 10, 557–570 (2002)

    Article  Google Scholar 

  36. Willenborg, L., DeWaal, T.: Elements of statistical disclosure control. Springer, New York (2001)

    Book  Google Scholar 

  37. Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101–1113 (1993)

    Article  Google Scholar 

Download references

Acknowledgments

Research of the first author has been supported by the National Council for Scientific and Technological Development—CNPq/Brazil Grant Numbers 474231/2010-0 and 305070/2011-8. The authors also thank Prof. Costas Panagiotakis for providing the Tarragona, Census and Eia datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Aloise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aloise, D., Hansen, P., Rocha, C. et al. Column generation bounds for numerical microaggregation. J Glob Optim 60, 165–182 (2014). https://doi.org/10.1007/s10898-014-0149-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-014-0149-3

Keywords

Navigation