Analysis of the Univariate Microaggregation Disclosure Risk

Nin, Jordi; Torra, Vicenç

doi:10.1007/s00354-007-0061-1

Analysis of the Univariate Microaggregation Disclosure Risk

Published: 07 August 2009

Volume 27, pages 197–214, (2009)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Jordi Nin¹ &
Vicenç Torra¹

94 Accesses
15 Citations
Explore all metrics

Abstract

Microaggregation is a protection method used by statistical agencies to limit the disclosure risk of confidential information. Formally, microaggregation assigns each original datum to a small cluster and then replaces the original data with the centroid of such cluster. As clusters contain at least k records, microaggregation can be considered as preserving k-anonymity. Nevertheless, this is only so when multivariate microaggregation is applied and, moreover, when all variables are microaggregated at the same time.

When different variables are protected using univariate microaggregation, k-anonymity is only ensured at the variable level. Therefore, the real k-anonymity decreases for most of the records and it is then possible to cause a leakage of privacy. Due to this, the analysis of the disclosure risk is still meaningful in microaggregation.

This paper proposes a new record linkage method for univariate microaggregation based on finding the optimal alignment between the original and the protected sorted variables. We show that our method, which uses a DTW distance to compute the optimal alignment, provides the intruder with enough information in many cases to to decide if the link is correct or not. Note that, standard record linkage methods never ensure the correctness of the linkage. Furthermore, we present some experiments using two well-known data sets, which show that our method has better results (larger number of correct links) than the best standard record linkage method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adam, N. R., Wortmann, J. C., “Security-control for statistical databases: a comparative study,” ACM Computing Surveys, 21, pp. 515-556, 1989.
Article Google Scholar
Brand, R., Domingo-Ferrer, J., and Mateo-Sanz, J. M., “Reference datasets to test and compare sdc methods for protection of numerical microdata,” European Project IST-2000-25069(CASC), 2002.
Capitani, P. and Ciaccia, P., “Efficiently and Accurately Comparing Real-valued Data Streams,” Proc. SEBD, pp. 161-168, 2005.
Data Extraction System, U.S. Census Bureau, http://www.census.gov/
Domingo-Ferrer, J. and Torra, V., “A Quantitative Comparison of Disclosure Control Methods for Microdata,” Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, pp.111-133, 2001.
Domingo-Ferrer, J., Mateo-Sanz J.M. (2002) “Practical data-oriented microaggregation for statistical disclosure control”. IEEE Trans. on Knowledge and Data Engineering, 14:189-201
Article Google Scholar
Domingo-Ferrer, J. and Torra, V., “Selecting potentially relevant records using re-identification methods,” New Generation Computing, 22, 3, pp. 239-252, 2004.
Article MATH Google Scholar
Domingo-Ferrer, J. and Torra, V., “Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation,” Data Mining and Knowledge Discovery, 11, pp.195-212, 2005.
Article MathSciNet Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté A., Mateo-Sanz, J.M. and Sebé, F., “Efficient multivariate data-oriented microaggregation,” The Very Large Databases Journal, 15, pp. 355-369, 2006.
Article Google Scholar
U.S. Energy Information Authority, http://www.eia.doe.gov/
Hansen, S. and Mukherjee, S., “A Polynomial Algorithm for Optimal Uni-variate Microaggregation,” IEEE Trans. on Knowledge and Data Engineering, 15, 4, pp. 1043-1044, 2003.
Article Google Scholar
Hundepool, A., Van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R. and Giessing, S., μ-ARGUS version 3.2 Software and User's Manual, Statistics Netherlands, Voorburg NL, 2003.
Jaro, M. A., “Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” Journal of the American Statistical Society, 84, 406, pp. 414-420, 1989.
Google Scholar
Lane, J., Heus, P. and Mulcahy, T., “Data Access in a Cyber World: Making Use of Cyberinfrastructure,” Transactions on Data Privacy, 1, 1, pp. 216, 2008.
Google Scholar
Laszlo, M. and Mukherjee, S., “Minimum Spanning Tree Partitioning Algorithm for Microaggregation,” IEEE Trans. on Knowledge and Data Engineering, 17, 7, pp. 902-911, 2005.
Article Google Scholar
Myers, C. S. and Rabiner, L. R., “A comparative study of several dynamic time-warping algorithms for connected word recognition,” The Bell System Technical Journal, 60, pp. 1389-1409, 1981.
Google Scholar
Nin, J., Herranz, J. and Torra, V., “Rethinking Rank Swapping to Decrease Disclosure Risk,” Data and Knowledge Engineering, 64, 1, pp. 346-364, 2008.
Article Google Scholar
Nin, J., Herranz, J. and Torra, V., “How to Group Attributes in Multivariate Microaggregation,” International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 16, 1, pp. 121-138, 2008.
Article Google Scholar
Nin, J., Herranz, J. and Torra, V., “On the Disclosure Risk of Multivariate Microaggregation,” Data and Knowledge Engineering, 67, pp. 399-412, 2008.
Article Google Scholar
Oganian, A. and Domingo-Ferrer, J., “On the Complexity of Optimal Microaggregation for Statistical Disclosure Control,” Statistical J. United Nations Economic Commission for Europe, 18, 4, pp. 345-354, 2000.
Google Scholar
Pagliuca, D. and Seri, G., “Some results of individual ranking method on the system of enterprise accounts annual survey,” Esprit SDC Project, Deliverable MI-3/D2., 1999.
Ratanamahatana, C. and Keogh, E., “Three Myths about Dynamic Time Warping Data Mining,” SIAM Int. Conf. on Data Mining, 2005.
Samarati, P. and Sweeney, L., “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory. Palo Alto, CA, 1998.
Sweeney, L., “k-anonymity: A model for protecting privacy,” Int. J. of Unc., Fuzziness and Knowledge Based Systems 10, 5, pp. 557–570. 2002.
Article MATH MathSciNet Google Scholar
Templ, M., “Statistical Disclosure Control for Microdata Using the R-Package sdcMicro,” Transactions on Data Privacy, 1, 2, pp. 67-85. 2008.
Google Scholar
Torra, V., Abowd, J. M. and Domingo-Ferrer, J., “Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment,” LNCS 4302, pp. 233-242, 2006.
Google Scholar
Torra, V. and Domingo-Ferrer, J. “Record linkage methods for multidatabase data mining,” Information Fusion in Data Mining, pp. 101-132., 2003.
Torra, V. and Miyamoto, S., “Evaluating fuzzy clustering algorithms for microdata protection,” LNCS 3050, pp. 175-186, 2004
Google Scholar
Torra, V. and Nin, J., “Record linkage for database integration using fuzzy integrals,” Int. Journal of Intelligent Systems, Wiley Publishers, 23, 6, pp. 715-4734, 2008.
Article MATH Google Scholar
Torra, V., “Constrained Microaggregation: Adding Constraints for Data Editing,” Transactions on Data Privacy, 1, 2, pp. 86-104, 2008.
Google Scholar
Wu, X., Bertino, E., “Achieving K-anonymity in mobile ad hoc networks,” 1st IEEE ICNP Workshop on Secure Network Protocols, pp. 37-42, 2005.
Wu, X., Bertino, E., “An Analysis Study on Zone-Based Anonymous Communication in Mobile Ad Hoc Networks,” IEEE Trans. on Dependable and Secure Computing 4, 4, pp. 252-265, 2007.
Article Google Scholar
Yancey, W. E., Winkler, W. E., Creecy, R. H., “Disclosure risk assessment in perturbative microdata protection,” LNCS 2316, pp. 135-152, 2002.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

IIIA, Artificial Intelligence Research Institute, CSIC, Spanish National Research Council, Campus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Jordi Nin & Vicenç Torra

Authors

Jordi Nin
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Nin.

About this article

Cite this article

Nin, J., Torra, V. Analysis of the Univariate Microaggregation Disclosure Risk. New Gener. Comput. 27, 197–214 (2009). https://doi.org/10.1007/s00354-007-0061-1

Download citation

Received: 19 November 2007
Revised: 07 November 2008
Published: 07 August 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s00354-007-0061-1

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the Univariate Microaggregation Disclosure Risk

Abstract

Access this article

Similar content being viewed by others

Beyond Multivariate Microaggregation for Large Record Anonymization

An Efficient Microaggregation Method for Protecting Mixed Data

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords:

Navigation

Analysis of the Univariate Microaggregation Disclosure Risk

Abstract

Access this article

Similar content being viewed by others

Beyond Multivariate Microaggregation for Large Record Anonymization

An Efficient Microaggregation Method for Protecting Mixed Data

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords:

Search

Navigation