Abstract
Inference control for protecting the privacy of microdata (individual data) should try to optimize the tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). Whereas risk measures are bounded between 0 and 1, information loss measures proposed in the literature for continuous data are unbounded, which makes it awkward to trade off information loss for disclosure risk. We propose in this paper to use probabilities to define bounded information loss measures for continuous microdata.
Similar content being viewed by others
References
Agrawal, D. and Aggarwal, C.C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara CA: ACM.
Dandekar, R., Domingo-Ferrer, J., and Sebé, F. 2002. Lhs-based hybrid microdata vs. rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 153–162
Domingo-Ferrer, J. and Mateo-Sanz, J.M. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201.
Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2001. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS'2001 vol. 2, Luxemburg: Eurostat, pp. 807–826
Domingo-Ferrer, J. and Torra, V. 2001a. Disclosure protection methods and information loss for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 91–110, http://vneumann.etse.urv.es/publications/bcpi
Domingo-Ferrer, J. and Torra, V. 2001b. A quantitative comparison of disclosure control methods for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 111–134, http://vneumann.etse.urv.es/publications/bcpi
Härdle, W. 1991. Smoothing Techniques with Implementation in S. New York: Springer-Verlag
Kendall, M.G., Stuart, A., J.K. Ord, S.F.A., and O'Hagan, A. 1994. Kendall's Advanced Theory of Statistics, Volume 1: Distribution Theory (6th Edition). London: Arnold
Moore, R. 1996. Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census, Washington, DC (unpublished manuscript).
Parzen, E. 1962. On estimation of a probability density and mode. Annals of Mathematical Statistics, 35:1065–1076.
Rosenblatt, M. 1956. Remarks on some non-parametric estimates of a density function. Annals of Mathematical Statistics, 27:642–669.
Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2002. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 163–171
Silverman, B.W. 1982. Kernel density estimation using the fast fourier transformation. Applied Statistics, 31:93–97.
Trottini, M. 2003. Decision models for data disclosure limitation. PhD thesis, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf
Winkler, W.E. 1999. Re-identification methods for evaluating the confidentiality of analytically valid microdata. In Statistical Data Protection, J. Domingo-Ferrer (Ed.), Luxemburg: Office for Official Publications of the European Communities. (Journal version in Research in Official Statistics, vol. 1, no. 2, pp. 50–69, 1998).
Yancey, W.E., Winkler, W.E., and Creecy, R.H. 2002. Disclosure risk assessment in perturbative microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 135–152
Acknowledgments
Thanks go to Jordi Castellà for his help in preparing the web form http://vneumann.etse.urv.es/SDC/measures . Also, comments by William Winkler greatly helped improving the presentation of this paper. This work was partly funded by the Spanish Ministry of Science and Technology and the European FEDER Fund under project TIC2001-0633-C03-01 “STREAMOBILE” and also by the Spanish Ministry of Education and Science under project SEG2004-04352-C04-01 “PROPRIETAS”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor:
Geoff Webb
Rights and permissions
About this article
Cite this article
Mateo-Sanz, J.M., Domingo-Ferrer, J. & Sebé, F. Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata. Data Min Knowl Disc 11, 181–193 (2005). https://doi.org/10.1007/s10618-005-0011-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-005-0011-9