Abstract
The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
de Wolf, P., van Gelder, I.: An Empirical Evaluation of PRAM. Technical Report Discussion Paper 04012, Statistics Netherlands, Voorburg/Heerlen (2004)
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)
Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, ch. 5, pp. 91–110. Elsevier, North-Holland (2001)
Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Fienberg, S., McIntyre, J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. Journal of Official Statistics 21(2), 309–323 (2005)
Fienberg, S., Slavković, A.: Data Privacy and Confidentiality. International Encyclopedia of Statistical Science. Springer (2010)
Gouweleeuw, J., Kooiman, P., Willenborg, L., de Wolf, P.: Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics 14(4), 332–346 (2005)
Ibrahim, J.: Incomplete Data in Generalized Linear Models. Journal of the American Statistical Association 85(411), 765–769 (1990)
Ibrahim, J., Chen, M.H., Lipsitz, S., Herring, A.: Missing-Data Methods for Generalized Linear Models: A Comparitive Review. Journal of the American Statistical Association 100(469), 332–346 (2005)
Ramanayake, A., Zayatz, L.: Balancing Disclosure Risk with Data Quality. Statistical Research Division Research Report Series 2010-04, U.S. Census Bureau (2010)
Reiter, J.: Releasing Multiply-Imputed, Synthetic Public Use Microdata: An Illustration and Empirical Study. Journal of the Royal Statistical Society 168, 185–205 (2005)
Shlomo, N., Skinner, C.: Privacy Protection From Sampling and Perturbation in Survey Microdata. S3RI Methodology Working Papers, M10/14. Southampton Statistical Sciences Research Institute, Southampton, GB (2010)
Slavković, A., Lee, J.: Synthetic Two-Way Contingency Tables that Preserve Conditional Frequencies. Statistical Methodology 7, 225–239 (2010)
van den Hout, A., van der Heijden, P.: Randomized Response, Statistical Disclosure Control and Misclassification: A Review. International Statistical Review 70(2), 269–288 (2002)
van den Hout, A., Kooiman, P.: Estimating the Linear Regression Model with Categorical Covariates Subject to Randomized Response. Computational Statistics & Data Analysis 50, 3311–3323 (2006)
van den Hout, A., van der Heijden, P., Gilchrist, R.: The Logistic Regression Model with Response Variables Suject to Randomized Response. Computational Statistics & Data Analysis 51, 6060–6069 (2007)
Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Springer, New York (1996)
Wu, C.F.J.: On the Convergence Properties of the EM Algorithm. The Annals of Statistics 11(1), 95–103 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Woo, Y.M.J., Slavković, A.B. (2012). Logistic Regression with Variables Subject to Post Randomization Method. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-33627-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33626-3
Online ISBN: 978-3-642-33627-0
eBook Packages: Computer ScienceComputer Science (R0)