Skip to main content

Valid Statistical Analysis for Logistic Regression with Multiple Sources

  • Conference paper
Protecting Persons While Protecting the People (ISIPS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 5661))

Included in the following conference series:

Abstract

Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Fienberg, S.: Privacy and confidentiality in an e-commerce world: Data mining, data warehousing, matching and disclosure limitation. Statistical Science 21, 143–154 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  2. Fienberg, S.: Data mining, privacy, disclosure limitation, and the hunt for terrorists. In: Chen, H., Reid, E., Sinai, J., Silke, A., Ganor, B. (eds.) Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security. Springer, New York (2008)

    Google Scholar 

  3. Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and Other National Goals: Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Assessment. National Academy Press, Washington (2008)

    Google Scholar 

  4. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas (2000)

    Google Scholar 

  5. Clifton, C., Vaidya, J., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)

    MATH  Google Scholar 

  6. Fienberg, S., Fulp, W., Slavkovic, A., Wrobel, T.: “Secure” log-linear and logistic regression analysis of distributed databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 277–290. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Ghosh, J., Reiter, J., Karr, A.: Secure computation with horizontally partitioned data using adaptive regression splines. Computational Statistics and Data Analysis (2006) (to appear)

    Google Scholar 

  8. Karr, A., Lin, X., Reiter, J., Sanil, A.: Secure regression on distributed databases. Journal of Computational and Graphical Statistics 14(2), 263–279 (2005)

    Article  MathSciNet  Google Scholar 

  9. Karr, A., Fulp, W., Lin, X., Reiter, J., Vera, F., Young, S.: Secure, privacy-preserving analysis of distributed databases. Technometrics (2007) (to appear)

    Google Scholar 

  10. Kantarcioglu, M., Clifton, C.: Privacy preserving data mining of association rules on horizontally partitioned data. Transaction of Knowledge and Data Engineering 16, 1026–1037 (2004)

    Article  Google Scholar 

  11. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (2002)

    Google Scholar 

  12. Yu, H., Jiang, X., Vaidya, J.: Privacy preserving svm using nonlinear kernels in horizontally partitioned data. In: Proc. of ACM SAC Conference Data Mining Track (2006)

    Google Scholar 

  13. Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving svm classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Sanil, A., Karr, A., Lin, X., Reiter, J.: Privacy preserving regression modelling via distributed computation. In: Proc. Tenth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining, pp. 677–682 (2004)

    Google Scholar 

  15. Sanil, A., Karr, A., Lin, X., Reiter, J.: Privacy preserving analysis of vertically partitioned data using secure matrix products. Journal of Official Statistics (2007); Revised manuscript under review (2007)

    Google Scholar 

  16. Du, W., Zhan, Z.: A practical approach to solve secure multi-party computation problems. In: New Security Paradigms Workshop, pp. 127–135. ACM Press, New York (2002)

    Google Scholar 

  17. Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)

    Google Scholar 

  18. Goldwasser, S.: Multi-party computations: Past and present. In: Proceedings of the 16th Annual ACM Symposium on Principles of Distributed Computing, pp. 1–6. ACM Press, New York (1997)

    Chapter  Google Scholar 

  19. Yao, A.: Protocols for secure computations. In: Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pp. 160–164. ACM Press, New York (1982)

    Google Scholar 

  20. Benaloh, J.: Secret sharing homomorphisms: Keeping shares of a secret secret. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 251–260. Springer, Heidelberg (1987)

    Google Scholar 

  21. Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002)

    MATH  Google Scholar 

  22. Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis: Therory and Practice. MIT Press, Cambridge (1975); Reprinted by Springer (2007)

    Google Scholar 

  23. Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptology 15(3), 177–206 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  24. Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. Journal of Privacy and Confidentiality (2008) (to appear)

    Google Scholar 

  25. Yao, A.C.: How to generate and exchange secrets. In: Proceedings of the 27th Symposium on Foundations of Computer Science (FOCS), pp. 162–167. IEEE, Los Alamitos (1986)

    Google Scholar 

  26. Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game - a completeness theorem for protocols with honest majority. In: Proceedings of the 19th annual Symposium on the Theory of Computing (STOC), pp. 218–229. ACM, New York (1987)

    Google Scholar 

  27. Reiter, J., Karr, A., Kohnen, C., Lin, X., Sanil, A.: Secure regression for vertically partitioned, partially overlapping data. In: Proceedings of the American Statistical Association (2004)

    Google Scholar 

  28. Fienberg, S., Karr, A., Nardi, Y., Slavkovic, A.: Secure logistic regression with distributed databases. In: Proceedings of the 56th Session of the ISI, The Bulletin of the International Statistical Institute (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fienberg, S.E., Nardi, Y., Slavković, A.B. (2009). Valid Statistical Analysis for Logistic Regression with Multiple Sources. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds) Protecting Persons While Protecting the People. ISIPS 2008. Lecture Notes in Computer Science, vol 5661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10233-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10233-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10232-5

  • Online ISBN: 978-3-642-10233-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics