Abstract
The biggest challenge when disclosing private data is to share information contained in databases while protecting people from being individually identified. Microaggregation is a family of methods for statistical disclosure control. The principle of microaggregation is that confidentiality rules permit the publication of individual records if they are partitioned into groups of size larger or equal to a fixed threshold value, where none is more representative than the others in the same group. The application of such rules leads to replacing individual values by those computed from small groups (microaggregates), before data publication. This work proposes a column generation algorithm for numerical microaggregation in which its pricing problem is solved by a specialized branch-and-bound. The algorithm is able to find, for the first time, lower bounds for instances of three real-world datasets commonly used in the literature. Furthermore, new best known solutions are obtained for these instances by means of a simple heuristic method with the columns generated.
Similar content being viewed by others
References
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu., A.: Approximation algorithms for \(k\)-anonymity. J. Privacy Tech. (2005).
Aloise, D., Hansen, P.: Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering. J. Glob. Optim. 49, 449–465 (2011)
Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131, 195–220 (2012)
Bonami, P., Lee, J.: BONMIN user’s manual. IBM Corporation, Tech. rep., New York (2007)
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)
Chang, C.C., Li, Y.C., Huang, W.H.: TRFP: An efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11, 195–212 (2005)
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55, 714–732 (2008)
du Merle, O., Hansen, P., Jaumard, B., Mladenović, N.: An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)
Elhallaoui, I., Villeneuve, D., Soumis, F., Desaulniers, G.: Dynamic aggregation of set-partitioning constraints in column generation. Oper. Res. 53, 632–645 (2005)
Goffin, J.L., Haurie, A., Vial, J.-P.: Decomposition and nondifferentiable optimization with the projective algorithm. Manag. Sci. 38, 284–302 (1992)
Grötschel, M., Wakabayashi, Y.: Facets of the clique partitioning polytope. Math. Program. 47, 367–387 (1990)
Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001)
Hansen, P., Mladenović, N., Pérez, J.: Variable neighborhood search. Methods Appl. 4OR6, 319–360 (2008)
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15, 1043–1044 (2003)
Heinz, G., Peterson, L., Johnson, R., Kerk, C.: Exploring relationships in body dimensions. J. Stat. Educ. 11. www.amstat.org/publications/jse/v11n2/datasets.heinz.html (2003)
Ji, X., Mitchell, J.E.: Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement. Discret. Optim. 4, 87–102 (2007)
Kabir, E., Wang, H., Zhang, Y.: A pairwise-systematic microaggregation for statistical disclosure control. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 266–273 (2010)
Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. C–24, 908–915 (1975)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17, 902–911 (2005)
Liberti, L.: Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1), 55–86 (2009)
Lin, J.L., Hsieh, T.H., Chang, J.C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37, 3256–3263 (2010)
Marsten, R., Hogan, W., Blankenship, J.: The boxstep method for large-scale optimization. Oper. Res. 23, 389–405 (1975)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nat. Econ. Com. Eur. 18, 345–354 (2001)
Panagiotakis, C., Tziritas, G.: Sucessive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25, 1191–1195 (2012)
Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70, 892–921 (2011)
Rocha Neto, A., Barreto, G.: On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: A comparative analysis. IEEE Lat. Am. Trans. 7, 487–496 (2009)
Ryan, D., Foster, B.: An integer programming approach to scheduling. In: A. Wren (ed.) Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, pp. 269–280. North-Holland (1981)
Solanas, A., Gavalda, A., Rallo, R.: Micro-som: a linear-time multivariate microaggregation algorithm based on self-organizing maps. LNCS 5768, 525–535 (2009)
Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: V-MDAV: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC (2006)
Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A \(2^d\)-tree-based blocking method for microaggregating very large data sets. In: Proceedings of the First international conference on availability, reliability and security (2006)
Sun, X., Wang, H., Li, J., Zhang, Y.: An approximate microaggregation approach for microdata protection. Expert Syst. Appl. 39, 2211–2219 (2012)
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Syst 10, 557–570 (2002)
Willenborg, L., DeWaal, T.: Elements of statistical disclosure control. Springer, New York (2001)
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101–1113 (1993)
Acknowledgments
Research of the first author has been supported by the National Council for Scientific and Technological Development—CNPq/Brazil Grant Numbers 474231/2010-0 and 305070/2011-8. The authors also thank Prof. Costas Panagiotakis for providing the Tarragona, Census and Eia datasets.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aloise, D., Hansen, P., Rocha, C. et al. Column generation bounds for numerical microaggregation. J Glob Optim 60, 165–182 (2014). https://doi.org/10.1007/s10898-014-0149-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-014-0149-3