Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

Mateo-Sanz, Josep M.; Domingo-Ferrer, Josep; Sebé, Francesc

doi:10.1007/s10618-005-0011-9

Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

Published: 02 September 2005

Volume 11, pages 181–193, (2005)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Josep M. Mateo-Sanz¹,
Josep Domingo-Ferrer¹ &
Francesc Sebé¹

448 Accesses
65 Citations
6 Altmetric
Explore all metrics

Abstract

Inference control for protecting the privacy of microdata (individual data) should try to optimize the tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). Whereas risk measures are bounded between 0 and 1, information loss measures proposed in the literature for continuous data are unbounded, which makes it awkward to trade off information loss for disclosure risk. We propose in this paper to use probabilities to define bounded information loss measures for continuous microdata.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data Privacy with $$R$$

Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

Data Privacy

References

Agrawal, D. and Aggarwal, C.C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara CA: ACM.
Dandekar, R., Domingo-Ferrer, J., and Sebé, F. 2002. Lhs-based hybrid microdata vs. rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 153–162
Google Scholar
Domingo-Ferrer, J. and Mateo-Sanz, J.M. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201.
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2001. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS'2001 vol. 2, Luxemburg: Eurostat, pp. 807–826
Domingo-Ferrer, J. and Torra, V. 2001a. Disclosure protection methods and information loss for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 91–110, http://vneumann.etse.urv.es/publications/bcpi
Domingo-Ferrer, J. and Torra, V. 2001b. A quantitative comparison of disclosure control methods for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 111–134, http://vneumann.etse.urv.es/publications/bcpi
Härdle, W. 1991. Smoothing Techniques with Implementation in S. New York: Springer-Verlag
MATH Google Scholar
Kendall, M.G., Stuart, A., J.K. Ord, S.F.A., and O'Hagan, A. 1994. Kendall's Advanced Theory of Statistics, Volume 1: Distribution Theory (6th Edition). London: Arnold
Moore, R. 1996. Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census, Washington, DC (unpublished manuscript).
Parzen, E. 1962. On estimation of a probability density and mode. Annals of Mathematical Statistics, 35:1065–1076.
Article MathSciNet Google Scholar
Rosenblatt, M. 1956. Remarks on some non-parametric estimates of a density function. Annals of Mathematical Statistics, 27:642–669.
Article MathSciNet Google Scholar
Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2002. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 163–171
Google Scholar
Silverman, B.W. 1982. Kernel density estimation using the fast fourier transformation. Applied Statistics, 31:93–97.
Article MATH Google Scholar
Trottini, M. 2003. Decision models for data disclosure limitation. PhD thesis, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf
Winkler, W.E. 1999. Re-identification methods for evaluating the confidentiality of analytically valid microdata. In Statistical Data Protection, J. Domingo-Ferrer (Ed.), Luxemburg: Office for Official Publications of the European Communities. (Journal version in Research in Official Statistics, vol. 1, no. 2, pp. 50–69, 1998).
Yancey, W.E., Winkler, W.E., and Creecy, R.H. 2002. Disclosure risk assessment in perturbative microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 135–152
Google Scholar

Download references

Acknowledgments

Thanks go to Jordi Castellà for his help in preparing the web form http://vneumann.etse.urv.es/SDC/measures . Also, comments by William Winkler greatly helped improving the presentation of this paper. This work was partly funded by the Spanish Ministry of Science and Technology and the European FEDER Fund under project TIC2001-0633-C03-01 “STREAMOBILE” and also by the Spanish Ministry of Education and Science under project SEG2004-04352-C04-01 “PROPRIETAS”.

Author information

Authors and Affiliations

Department of Computer Engineering and Mathematics, Rovira i Virgili University of Tarragona, Av. Països Catalans 26, E-43007, Tarragona, Catalonia, Spain
Josep M. Mateo-Sanz, Josep Domingo-Ferrer & Francesc Sebé

Authors

Josep M. Mateo-Sanz
View author publications
You can also search for this author in PubMed Google Scholar
Josep Domingo-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Sebé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josep M. Mateo-Sanz.

Additional information

Editor:

Geoff Webb

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mateo-Sanz, J.M., Domingo-Ferrer, J. & Sebé, F. Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata. Data Min Knowl Disc 11, 181–193 (2005). https://doi.org/10.1007/s10618-005-0011-9

Download citation

Published: 02 September 2005
Issue Date: September 2005
DOI: https://doi.org/10.1007/s10618-005-0011-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

Abstract

Access this article

Similar content being viewed by others

Data Privacy with $$R$$

Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

Data Privacy

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Editor:

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

Abstract

Access this article

Similar content being viewed by others

Data Privacy with $$R$$

Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

Data Privacy

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Editor:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation