Skip to main content

Robust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking

  • Conference paper
Privacy in Statistical Databases (PSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5262))

Included in the following conference series:

Abstract

The aim of this study is to evaluate the risk of re-identification related to distance-based disclosure risk measures for numerical variables. First, we overview different - already proposed - disclosure risk measures. Unfortunately, all these measures do not account for outliers. We assume that outliers must be protected more than observations near the center of the data cloud. Therefore, we propose a weighting scheme for each observation based on the concept of robust Mahalanobis distances. We also consider the peculiarities of different protection methods and adapt our measures to be able to give realistic measures for each method. In order to test our proposed distance based disclosure risk measures we run a simulation study with different amounts of data contamination. The results of the simulation study shows the usefulness of the proposed measures and gives deeper insights into how the risk of quantitative data can be measured successfully. All the methods proposed and all the protection methods plus measures used in this paper are implemented in R-package sdcMicro which is freely available on the comprehensive R archive network (http://cran.r-project.org).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Benedetti, R., Franconi, L.: Statistical and technological solutions for controlled data dissemination. In: Pre-Proceedings of New Techniques and Technologies for Statistics, pp. 225–232 (1998)

    Google Scholar 

  3. Franconi, L., Polettini, S.: Individual risk estimation in μ-Argus: a review. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 262–272. Springer, Heidelberg (2004)

    Google Scholar 

  4. Elamir, E., Skinner, C.: Record level measures of disclosure risk for survey microdata. Journal of Official Statistics (submitted, 2006)

    Google Scholar 

  5. Templ, M.: sdcMicro: A package for statistical disclosure control in R. In: Bulletin of the International Statistical Institute, 56th Session (2007)

    Google Scholar 

  6. Templ, T.: sdcMicro: Statistical Disclosure Control methods for the generation of public- and scientific-use files, R package version 2.4.7 (2008)

    Google Scholar 

  7. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–134 (2001)

    Google Scholar 

  8. Bacher, J., Brand, R., Bender, S.: Re-identifying register data by survey data using cluster analysis: An empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 589–608 (2002)

    Article  MATH  Google Scholar 

  9. Templ, M.: Software development for SDC in R. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 347–359. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Muralidhar, K., Sarathy, R., Dankekar, R.: Why swap when you can shuffle? a comparison of the proximity swap and data shuffle for numeric data. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 164–176. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Mateo-Sanz, J., Sebe, F., Domingo-Ferrer, J.: Outlier protection in continuous microdata masking. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 201–215. Springer, Heidelberg (2004)

    Google Scholar 

  12. Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Article  Google Scholar 

  13. Filzmoser, P.: A multivariate outlier detection method. In: Aivazian, S., Filzmoser, P., Kharin, Y. (eds.) Proceedings of the Seventh International Conference on Computer Data Analysis and Modeling, pp. 18–22. Belarusian State University, Minsk (2004)

    Google Scholar 

  14. Templ, M., Meindl, B.: Why shuffle when you can use robust statistics for SDC - a simulation study. In: Domingo-Ferrer, J., Saygin, Y. (eds.) PSD 2008. LNCS, vol. 5262. Springer, Heidelberg (2008) (submitted and in review)

    Google Scholar 

  15. Mateo-Sanz, J., Domingo-Ferrer, J., Sebe, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. In: Webb, G. (ed.) Data Mining and Knowledge Discovery, vol. 11, pp. 181–193. Springer, Heidelberg (2005)

    Google Scholar 

  16. Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Management Science 52(2), 658–670 (2006)

    Article  Google Scholar 

  17. Brand, R., Giessing, S.: Report on preparation of the data set and improvements on sullivans algorithm. Technical report (2002)

    Google Scholar 

  18. Ting, D., Fienberg, S., Trottini, M.: ROMM methodology for microdata release. In: Monographs of official statistics, Work session on statistical data confidentiality, Eurostat, Luxembourg (2005)

    Google Scholar 

  19. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., De Wolf, P.P.: Handbook on statistical disclosure control version 1.01 (2007)

    Google Scholar 

  20. Templ, M.: sdcMicro: A new flexible R-package for the generation of anonymised microdata - design issues and new methods. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality. Monographs of Official Statistics (to appear, 2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josep Domingo-Ferrer Yücel Saygın

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Templ, M., Meindl, B. (2008). Robust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87471-3_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87470-6

  • Online ISBN: 978-3-540-87471-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics