Abstract
Optimally micro-aggregating a multivariate data set is known to be NP-hard, thus, heuristic approaches are used to cope with this privacy preserving problem. Unfortunately, algorithms in the literature are computationally costly, and this prevents using them on large data sets.
We propose a partitioning algorithm to micro-aggregate uniform very large data sets with cost O(n). We provide the mathematical foundations proving the efficiency of our algorithm and we show that the error associated to micro-aggregation is bounded and decreases when the number of micro-aggregated records grows. The experimental results confirm the prediction of the mathematical analysis. In addition, we provide a comparison between our proposal and MDAV, a well-known micro-aggregation algorithm with cost O(n 2).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boyens, C., Krishnan, R., Padman, R.: On privacy-preserving access to distributed heterogeneous healthcare information. In: Proceedings of the 37th Hawaii International Conference on System Sciences HICSS-37, Big Island, HI. IEEE Computer Society, Los Alamitos (2004)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogenerous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
HIPAA. Health insurance portability and accountability act (2004), http://www.hhs.gov/ocr/hipaa/
Hundepool, A., Van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S.: μ-ARGUS version 4.0 Software and User’s Manual. Statistics Netherlands, Voorburg NL (May 2005)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)
Martinez-Balleste, A., Solanas, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A genetic approach to multivariate microaggregation for database privacy. In: IEEE 23rd International Conference on Data Engineering ICDE, April 17-20, 2007, pp. 180–185 (2007)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe 18(4), 345–354 (2001)
European Parliament. DIRECTIVE 2002/58/EC of the European Parliament and Council of 12 july 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communications) (2002), http://europa.eu.int/eur-lex/pri/en/oj/dat/2002/l_201/l_20120020731en00370047.pdf
Canadian Privacy. Canadian privacy regulations (2005), http://www.media-awareness.ca/english/issues/privacy/canadian_legislation_privacy.cfm
Solanas, A., MartÃnez-Ballesté, A.: V-MDAV: Variable group size multivariate microaggregation. In: COMPSTAT 2006, Rome, pp. 917–925 (2006)
Solanas, A., Martinez-Balleste, A., Mateo-Sanz, J.M., Domingo-Ferrer, J.: Multivariate microaggregation based on genetic algorithms. In: 3rd International IEEE Conference on Intelligent Systems IS, pp. 65–70 (2006)
USPrivacy. U.S. privacy regulations (2005), http://www.media-awareness.ca/english/issues/privacy/us_legislation_privacy.cfm
Willenborg, L., DeWaal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Solanas, A., Di Pietro, R. (2008). A Linear-Time Multivariate Micro-aggregation for Privacy Protection in Uniform Very Large Data Sets. In: Torra, V., Narukawa, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2008. Lecture Notes in Computer Science(), vol 5285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88269-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-88269-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88268-8
Online ISBN: 978-3-540-88269-5
eBook Packages: Computer ScienceComputer Science (R0)