Skip to main content

Robustification of Microdata Masking Methods and the Comparison with Existing Methods

  • Conference paper
Privacy in Statistical Databases (PSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5262))

Included in the following conference series:

Abstract

The aim of this study was to compare different microdata protection methods for numerical variables under various conditions. Most of the methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network ( http://cran.r-project.org ). The other methods used can be easily applied using other R-packages. While most methods work well for homogeneous data sets, some methods fail completely when confidential variables contain outliers which is almost always the case with data from official statistics. To overcome these problems we have robustified popular methods such as microaggregation or shuffling which is based on a regression model. All methods have beed tested on bivariate data sets featuring different outlier scenarios. Additionally, a simulation study was performed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Templ, M.: sdcMicro: A package for statistical disclosure control in R. In: Bulletin of the International Statistical Institute, 56th Session (2007)

    Google Scholar 

  2. Meindl, B., Templ, M.: The anonymisation of the CVTS2 and income tax dataset. an approach using R-package sdcMicro. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Monographs of Official Statistics (to appear, 2007)

    Google Scholar 

  3. Karr, A., Oganian, A., Reiter, J., Woo, M.J.: New measures of data utility. Technical report (2006)

    Google Scholar 

  4. Templ, M.: Software development for SDC in R. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 347–359. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Muralidhar, K., Sarathy, R., Dankekar, R.: Why swap when you can shuffle? a comparison of the proximity swap and data shuffle for numeric data. In: Privacy in Statistical Databases. LNCS, pp. 164–176. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Management Science 52(2), 658–670 (2006)

    Article  Google Scholar 

  7. Templ, T.: sdcMicro: Statistical Disclosure Control methods for the generation of public- and scientific-use files, R package version 2.4.7 (2008)

    Google Scholar 

  8. Templ, M.: sdcMicro: A new flexible R-package for the generation of anonymised microdata - design issues and new methods. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Monographs of Official Statistics (to appear, 2007)

    Google Scholar 

  9. R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0

    Google Scholar 

  10. Brand, R., Giessing, S.: Report on preparation of the data set and improvements on sullivans algorithm. Technical report (2002)

    Google Scholar 

  11. Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, pp. 303–308 (1986)

    Google Scholar 

  12. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., De Wolf, P.P.: Handbook on statistical disclosure control version 1.01 (2007)

    Google Scholar 

  13. Brand, R.: Microdata protection through noise addition. In: PSD 2004. LNCS, pp. 347–359. Springer, Heidelberg (2004)

    Google Scholar 

  14. Ting, D., Fienberg, S., Trottini, M.: ROMM methodology for microdata release. In: Monographs of official statistics, Work session on statistical data confidentiality, Eurostat, Luxembourg (2005)

    Google Scholar 

  15. Dalenius, T., Reiss, S.: Data-swapping: A technique for disclosure control. In: Proceedings of the Section on Survey Research Methods, vol. 6, pp. 73–85. American Statistical Association (1982)

    Google Scholar 

  16. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, Ottawa, pp. 195–204 (1993)

    Google Scholar 

  17. Muralidhar, K., Parsa, R., Sarathy, R.: A general additive data perurbation method for database security. Management Science 45, 1399–1415 (1999)

    Article  Google Scholar 

  18. Huber, P.: Robust Statistics. Wiley and Sons, New York (1981)

    MATH  Google Scholar 

  19. Moore, R.: Controlled data-swapping techniques for masking public use microdata sets. Technical report (1996)

    Google Scholar 

  20. Maronna, R.: Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4(1), 51–67 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  21. Rousseeuw, P.: Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications, Akademiai Kiado, Budapest, pp. 283–297 (1985)

    Google Scholar 

  22. Maronna, R., Zamar, R.: Robust multivariate estimates for highdimensional datasets. Technometrics 44, 307–317 (2002)

    Article  MathSciNet  Google Scholar 

  23. Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on Knowledge and Data Engineering 14(1), 189–201 (2002)

    Article  Google Scholar 

  24. Mateo-Sanz, J., Martínez-Ballesté, A., Domingo-Ferrer, J.: Fast generation of accurate synthetic microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 298–306. Springer, Heidelberg (2004)

    Google Scholar 

  25. Burridge, J.: Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003)

    Article  MathSciNet  Google Scholar 

  26. Torra, V., Abowd, J., Domingo-Ferrer, J.: Using mahalanobis distance-based record linkage for disclosure risk assessment. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 233–242. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Griffin, R., Navarro, A., Flores-Baez, L.: Disclosure avoidance for the 1990 census. In: Proceedings of the Section on Survey Research Methods, pp. 516–521. American Statistical Association (1989)

    Google Scholar 

  28. Rubin, D.: Discussion of statistical disclosure limitation. Journal of Official Statistics 9(2), 461–468 (1993)

    Google Scholar 

  29. Iman, R., Conover, W.: A distribution-free approach to inducing rank correlation among input variables. Communications in Statistics B11, 311–334 (1982)

    Google Scholar 

  30. Stein, M.: Large sample properties of simulations using latin hypercube sampling. Technometrics 29, 143–151 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  31. Wyss, G., Jorgensen, K.: Sandia’s latin hypercube sampling software. Technical report sand98-0210, Sandia National Laboratories, Albuquerque, NM (1998)

    Google Scholar 

  32. Minasny, B.: Sampling methods for uncertainty analysis, Matlab Toolbox for Latin Hypercube Sampling (2003)

    Google Scholar 

  33. Yancey, W., Winkler, W., Creecy, R.: Disclosure risk assessment in perturbative microdata protection. In: Inference Control in Statistical Databases. LNCS, pp. 49–60. Springer, Heidelberg (2002)

    Google Scholar 

  34. Mateo-Sanz, J.M., Sebe, F., Domingo-Ferrer, J.: Outlier protection in continuous microdata masking. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 201–215. Springer, Heidelberg (2004)

    Google Scholar 

  35. Mateo-Sanz, J., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery 11, 181–193 (2005)

    Article  MathSciNet  Google Scholar 

  36. Domingo-Ferrer, J., Mateo-Sanz, J., Torra, V.: Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In: Pre-Proccedings of ETK-NTTS, vol. 2, pp. 807–826. Springer, Heidelberg (2001)

    Google Scholar 

  37. Templ, M., Meindl, B.: Robust statistics meets SDC: New disclosure risk measures for continuous microdata masking. In: Domingo-Ferrer, J., Saygin, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 177–189. Springer, Heidelberg (2008) (submitted and in review)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josep Domingo-Ferrer Yücel Saygın

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Templ, M., Meindl, B. (2008). Robustification of Microdata Masking Methods and the Comparison with Existing Methods. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87471-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87470-6

  • Online ISBN: 978-3-540-87471-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics