Skip to main content

Logistic Regression with Variables Subject to Post Randomization Method

  • Conference paper
Book cover Privacy in Statistical Databases (PSD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7556))

Included in the following conference series:

Abstract

The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. de Wolf, P., van Gelder, I.: An Empirical Evaluation of PRAM. Technical Report Discussion Paper 04012, Statistics Netherlands, Voorburg/Heerlen (2004)

    Google Scholar 

  2. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  3. Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, ch. 5, pp. 91–110. Elsevier, North-Holland (2001)

    Google Scholar 

  4. Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Fienberg, S., McIntyre, J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. Journal of Official Statistics 21(2), 309–323 (2005)

    Google Scholar 

  6. Fienberg, S., Slavković, A.: Data Privacy and Confidentiality. International Encyclopedia of Statistical Science. Springer (2010)

    Google Scholar 

  7. Gouweleeuw, J., Kooiman, P., Willenborg, L., de Wolf, P.: Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics 14(4), 332–346 (2005)

    Google Scholar 

  8. Ibrahim, J.: Incomplete Data in Generalized Linear Models. Journal of the American Statistical Association 85(411), 765–769 (1990)

    Article  Google Scholar 

  9. Ibrahim, J., Chen, M.H., Lipsitz, S., Herring, A.: Missing-Data Methods for Generalized Linear Models: A Comparitive Review. Journal of the American Statistical Association 100(469), 332–346 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  10. Ramanayake, A., Zayatz, L.: Balancing Disclosure Risk with Data Quality. Statistical Research Division Research Report Series 2010-04, U.S. Census Bureau (2010)

    Google Scholar 

  11. Reiter, J.: Releasing Multiply-Imputed, Synthetic Public Use Microdata: An Illustration and Empirical Study. Journal of the Royal Statistical Society 168, 185–205 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Shlomo, N., Skinner, C.: Privacy Protection From Sampling and Perturbation in Survey Microdata. S3RI Methodology Working Papers, M10/14. Southampton Statistical Sciences Research Institute, Southampton, GB (2010)

    Google Scholar 

  13. Slavković, A., Lee, J.: Synthetic Two-Way Contingency Tables that Preserve Conditional Frequencies. Statistical Methodology 7, 225–239 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. van den Hout, A., van der Heijden, P.: Randomized Response, Statistical Disclosure Control and Misclassification: A Review. International Statistical Review 70(2), 269–288 (2002)

    Article  MATH  Google Scholar 

  15. van den Hout, A., Kooiman, P.: Estimating the Linear Regression Model with Categorical Covariates Subject to Randomized Response. Computational Statistics & Data Analysis 50, 3311–3323 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. van den Hout, A., van der Heijden, P., Gilchrist, R.: The Logistic Regression Model with Response Variables Suject to Randomized Response. Computational Statistics & Data Analysis 51, 6060–6069 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  17. Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Springer, New York (1996)

    Book  MATH  Google Scholar 

  18. Wu, C.F.J.: On the Convergence Properties of the EM Algorithm. The Annals of Statistics 11(1), 95–103 (1983)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Woo, Y.M.J., Slavković, A.B. (2012). Logistic Regression with Variables Subject to Post Randomization Method. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33627-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33626-3

  • Online ISBN: 978-3-642-33627-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics