Skip to main content
Log in

Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Inference control for protecting the privacy of microdata (individual data) should try to optimize the tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). Whereas risk measures are bounded between 0 and 1, information loss measures proposed in the literature for continuous data are unbounded, which makes it awkward to trade off information loss for disclosure risk. We propose in this paper to use probabilities to define bounded information loss measures for continuous microdata.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, D. and Aggarwal, C.C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara CA: ACM.

  • Dandekar, R., Domingo-Ferrer, J., and Sebé, F. 2002. Lhs-based hybrid microdata vs. rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 153–162

    Google Scholar 

  • Domingo-Ferrer, J. and Mateo-Sanz, J.M. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201.

    Article  Google Scholar 

  • Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2001. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS'2001 vol. 2, Luxemburg: Eurostat, pp. 807–826

  • Domingo-Ferrer, J. and Torra, V. 2001a. Disclosure protection methods and information loss for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 91–110, http://vneumann.etse.urv.es/publications/bcpi

  • Domingo-Ferrer, J. and Torra, V. 2001b. A quantitative comparison of disclosure control methods for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 111–134, http://vneumann.etse.urv.es/publications/bcpi

  • Härdle, W. 1991. Smoothing Techniques with Implementation in S. New York: Springer-Verlag

    MATH  Google Scholar 

  • Kendall, M.G., Stuart, A., J.K. Ord, S.F.A., and O'Hagan, A. 1994. Kendall's Advanced Theory of Statistics, Volume 1: Distribution Theory (6th Edition). London: Arnold

  • Moore, R. 1996. Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census, Washington, DC (unpublished manuscript).

  • Parzen, E. 1962. On estimation of a probability density and mode. Annals of Mathematical Statistics, 35:1065–1076.

    Article  MathSciNet  Google Scholar 

  • Rosenblatt, M. 1956. Remarks on some non-parametric estimates of a density function. Annals of Mathematical Statistics, 27:642–669.

    Article  MathSciNet  Google Scholar 

  • Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2002. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 163–171

    Google Scholar 

  • Silverman, B.W. 1982. Kernel density estimation using the fast fourier transformation. Applied Statistics, 31:93–97.

    Article  MATH  Google Scholar 

  • Trottini, M. 2003. Decision models for data disclosure limitation. PhD thesis, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf

  • Winkler, W.E. 1999. Re-identification methods for evaluating the confidentiality of analytically valid microdata. In Statistical Data Protection, J. Domingo-Ferrer (Ed.), Luxemburg: Office for Official Publications of the European Communities. (Journal version in Research in Official Statistics, vol. 1, no. 2, pp. 50–69, 1998).

  • Yancey, W.E., Winkler, W.E., and Creecy, R.H. 2002. Disclosure risk assessment in perturbative microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 135–152

    Google Scholar 

Download references

Acknowledgments

Thanks go to Jordi Castellà for his help in preparing the web form http://vneumann.etse.urv.es/SDC/measures . Also, comments by William Winkler greatly helped improving the presentation of this paper. This work was partly funded by the Spanish Ministry of Science and Technology and the European FEDER Fund under project TIC2001-0633-C03-01 “STREAMOBILE” and also by the Spanish Ministry of Education and Science under project SEG2004-04352-C04-01 “PROPRIETAS”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep M. Mateo-Sanz.

Additional information

Editor:

Geoff Webb

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mateo-Sanz, J.M., Domingo-Ferrer, J. & Sebé, F. Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata. Data Min Knowl Disc 11, 181–193 (2005). https://doi.org/10.1007/s10618-005-0011-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-005-0011-9

Keywords

Navigation