Abstract
The release of survey microdata files requires a preliminary assessment of the disclosure risk of the data. Record-level risk measures can be useful for “local” protection (e.g. partially synthetic data [21], or local suppression [25]), and are also used in [22] and [16] to produce global risk measures [13] useful to assess data release. Whereas different proposals to estimating such risk measures are available in the literature, so far only a few attempts have been targeted to the evaluation of the statistical properties of these estimators. In this paper we pursue a simulation study that aims to evaluate the statistical properties of risk estimators. Besides presenting results about the Benedetti-Franconi individual risk estimator (see [11]), we also propose a strategy to produce improved risk estimates, and assess the latter by simulation.
The problem of estimating per record reidentification risk enjoys many similarities with that of small area estimation (see [19]): we propose to introduce external information, arising from a previous census, in risk estimation. To achieve this we consider a simple strategy, namely Structure Preserving Estimation (SPREE) of Purcell and Kish [18], and show by simulation that this procedure provides better estimates of the individual risk of reidentification disclosure, especially for records whose risk is high.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions. Dover, New York (1965)
Benedetti, R., Franconi, L.: Statistical and technological solutions for controlled data dissemination. In: Pre-proceedings of New Techniques and Technologies for Statistics, Sorrento, June 4-6, 1998, vol. 1, pp. 225–232 (1998)
Carlson, M.: Assessing microdata disclosure risk using the Poisson-inverse Gaussian distribution. Statistics in Transition 5, 901–925 (2002)
Chen, G., Keller-McNulty, S.: Estimation of identification disclosure risk in microdata. Journal of Official Statistics 14, 79–95 (1998)
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 367–382 (1992)
Di Consiglio, L., Franconi, L., Seri, G.: Assessing individual risk of disclosure: an experiment. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, April 7-9 (2003)
Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination (with comments). Journal of the American Statistical Association 81, 10–27 (1986)
Elamir, E.A.H., Skinner, C.J.: Modeling the re-identification risk per record in microdata. In: 54th Session of the International Statistical Institute, Berlin, August 13-20 (2003)
Fienberg, S.E., Makov, U.E.: Confidentiality, uniqueness, and disclosure limitation for categorical data. Journal of Official Statistics 14, 385–397 (1998)
Forster, J.J.: Bayesian methods for disclosure risk assessment. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Geneva, November 9-11, 2005, pp. 99–108. Luxembourg (2005)
Franconi, L., Polettini, S.: Individual risk estimation in μ-Argus: A review. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 262–272. Springer, Heidelberg (2004)
Hundepool, A.: The CASC Project. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 172–180. Springer, Heidelberg (2002)
Lambert, D.: Measures of disclosure risk and harm. Journal of Official Statistics 9, 313–331 (1993)
Madow, W.G.: On the theory of systematic sampling ii. The Annals of Mathematical Statistics 20, 333–354 (1949)
Omori, Y.: Measuring identification disclosure risk for categorical microdata by posterior population uniqueness. In: Proceedings of the Conference on Statistical Data Protection, Lisbon, March, 25-27, 1998, pp. 59–76. Eurostat, Luxembourg (1999)
Polettini, S.: Some remarks on the individual risk methodology. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, April 7-9 (2003)
Polettini, S.: Revision of Guidelines for the protection of social micro-data using individual risk methodology: Application within μ-Argus version 3.2, by S. Polettini and G. seri. CASC-Computational Aspects of Statistical Confidentiality Deliverable No: 1.2-D3 (2004), available at http://neon.vb.cbs.nl/casc/deliv/CASC_1.2D3_guidelines_new.pdf
Purcell, N.J., Kish, L.: Postcensal estimates for local areas (small domains). International Statistical Review 48, 3–18 (1980)
Rao, J.N.K.: Small area estimation. John Wiley & Sons, Hoboken (2003)
Reiter, J.P.: Estimating risks of identification disclosure for microdata. Journal of the American Statistical Association 100, 1103–1113 (2005)
Reiter, J.P.: Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A 168 (2005)
Rinott, Y.: On models for statistical disclosure risk estimation. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, Luxembourg, April 7-9 (2003)
Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society, Series B 64, 855–867 (2002)
Skinner, C.J., Holmes, D.J.: Estimating the re-identification risk per record in microdata. Journal of Official Statistics 14, 361–372 (1998)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Zhang, L., Chambers, R.L.: Small area estimates for cross-classifications. J. R. Stat. Soc. Ser. B Stat. Methodol. 66(2), 479–496 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Consiglio, L., Polettini, S. (2006). Improving Individual Risk Estimators. In: Domingo-Ferrer, J., Franconi, L. (eds) Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol 4302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930242_21
Download citation
DOI: https://doi.org/10.1007/11930242_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49330-3
Online ISBN: 978-3-540-49332-7
eBook Packages: Computer ScienceComputer Science (R0)