Abstract
Microaggregation is one of the most commonly employed microdata protection methods. This method builds clusters of at least k original records and replaces the records in each cluster with the centroid of the cluster. Usually, when records are complex, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. In this way, the information loss when collapsing several values to the centroid of their group is reduced, at the cost of losing the k-anonymity property when at least two attributes of different blocks are known by the intruder.
In this work, we present a new microaggregation method called One dimension microaggregation (Mic1D − κ). This method gathers all the values of the data set into a single sorted vector, independently of the attribute they belong to. Then, it microaggregates all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adam, N.R., Wortmann, J.C.: Security-control for statistical databases: a comparative study. ACM Computing Surveys 21, 515–556 (1989)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: [6], pp. 91–110 (2001)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: [6], pp. 111–133 (2001)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. The VLDB Journal 15, 355–369 (2006)
Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.): Confidentiality, disclosure, and data access: theory and practical applications for statistical agencies. Elsevier Science, Amsterdam (2001)
Felsö, F., Theeuwes, J., Wagner, G.: Disclosure Limitation in Use: Results of a Survey. In: [6], pp. 17–42 (2001)
Hansen, S., Mukherjee, S.: A Polynomial Algorithm for Optimal Univariate Microaggregation. Trans. on Knowledge and Data Engineering 15(4), 1043–1044 (2003)
Medrano-Gracia, P., Pont-Tuset, J., Nin, J., Muntés-Mulero, V.: Ordered Data Set Vectorization for Linear Regression on Data Privacy. In: Torra, V., Narukawa, Y., Yoshida, Y. (eds.) MDAI 2007. LNCS (LNAI), vol. 4617, pp. 361–372. Springer, Heidelberg (2007)
Murphy, P., Aha, D.W.: UCI Repository machine learning databases. University of California, Department of Information and Computer Science, Irvine (1994), http://www.ics.uci.edu/~mlearn/MLRepository.html
Nin, J., Herranz, J., Torra, V.: Attribute Selection in Multivariate Microaggregation. In: Post-Proc. of 11th ACM International Conference on Extending Database Technology (2008)
Nin, J., Herranz, J., Torra, V.: How to group attributes in multivariate microaggregation. Int. J. on Uncertainty, Fuzziness and Knowledge-Based Systems 16(1), 121–138 (2008)
Nin, J., Torra, V.: Analysis of the Univariate Microaggregation Disclosure Risk (submitted, 2007)
Oganian, A., Domingo-Ferrer, J.: On the Complexity of Optimal Microaggregation for Statistical Disclosure Control. Statistical J. United Nations Economic Commission for Europe 18(4), 345–354 (2000)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, SRI Intl. Tech. Rep. (1998)
Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. of Unc., Fuzz. and Knowledge Based Systems 10(5), 459–476 (2002)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. of Unc., Fuzz. and Knowledge Based Systems 10(5), 571–588 (2002)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. of Unc., Fuzz. and Knowledge Based Systems 10(5), 557–570 (2002)
U.S. Census Bureau, Data Extraction System (1990), http://www.census.gov/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pont-Tuset, J., Nin, J., Medrano-Gracia, P., Larriba-Pey, J.L., Muntés-Mulero, V. (2008). Improving Microaggregation for Complex Record Anonymization. In: Torra, V., Narukawa, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2008. Lecture Notes in Computer Science(), vol 5285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88269-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-88269-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88268-8
Online ISBN: 978-3-540-88269-5
eBook Packages: Computer ScienceComputer Science (R0)