Abstract
Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fienberg, S.: Privacy and confidentiality in an e-commerce world: Data mining, data warehousing, matching and disclosure limitation. Statistical Science 21, 143–154 (2006)
Fienberg, S.: Data mining, privacy, disclosure limitation, and the hunt for terrorists. In: Chen, H., Reid, E., Sinai, J., Silke, A., Ganor, B. (eds.) Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security. Springer, New York (2008)
Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and Other National Goals: Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Assessment. National Academy Press, Washington (2008)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas (2000)
Clifton, C., Vaidya, J., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)
Fienberg, S., Fulp, W., Slavkovic, A., Wrobel, T.: “Secure” log-linear and logistic regression analysis of distributed databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 277–290. Springer, Heidelberg (2006)
Ghosh, J., Reiter, J., Karr, A.: Secure computation with horizontally partitioned data using adaptive regression splines. Computational Statistics and Data Analysis (2006) (to appear)
Karr, A., Lin, X., Reiter, J., Sanil, A.: Secure regression on distributed databases. Journal of Computational and Graphical Statistics 14(2), 263–279 (2005)
Karr, A., Fulp, W., Lin, X., Reiter, J., Vera, F., Young, S.: Secure, privacy-preserving analysis of distributed databases. Technometrics (2007) (to appear)
Kantarcioglu, M., Clifton, C.: Privacy preserving data mining of association rules on horizontally partitioned data. Transaction of Knowledge and Data Engineering 16, 1026–1037 (2004)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (2002)
Yu, H., Jiang, X., Vaidya, J.: Privacy preserving svm using nonlinear kernels in horizontally partitioned data. In: Proc. of ACM SAC Conference Data Mining Track (2006)
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving svm classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006)
Sanil, A., Karr, A., Lin, X., Reiter, J.: Privacy preserving regression modelling via distributed computation. In: Proc. Tenth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining, pp. 677–682 (2004)
Sanil, A., Karr, A., Lin, X., Reiter, J.: Privacy preserving analysis of vertically partitioned data using secure matrix products. Journal of Official Statistics (2007); Revised manuscript under review (2007)
Du, W., Zhan, Z.: A practical approach to solve secure multi-party computation problems. In: New Security Paradigms Workshop, pp. 127–135. ACM Press, New York (2002)
Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)
Goldwasser, S.: Multi-party computations: Past and present. In: Proceedings of the 16th Annual ACM Symposium on Principles of Distributed Computing, pp. 1–6. ACM Press, New York (1997)
Yao, A.: Protocols for secure computations. In: Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pp. 160–164. ACM Press, New York (1982)
Benaloh, J.: Secret sharing homomorphisms: Keeping shares of a secret secret. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 251–260. Springer, Heidelberg (1987)
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002)
Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis: Therory and Practice. MIT Press, Cambridge (1975); Reprinted by Springer (2007)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptology 15(3), 177–206 (2002)
Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. Journal of Privacy and Confidentiality (2008) (to appear)
Yao, A.C.: How to generate and exchange secrets. In: Proceedings of the 27th Symposium on Foundations of Computer Science (FOCS), pp. 162–167. IEEE, Los Alamitos (1986)
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game - a completeness theorem for protocols with honest majority. In: Proceedings of the 19th annual Symposium on the Theory of Computing (STOC), pp. 218–229. ACM, New York (1987)
Reiter, J., Karr, A., Kohnen, C., Lin, X., Sanil, A.: Secure regression for vertically partitioned, partially overlapping data. In: Proceedings of the American Statistical Association (2004)
Fienberg, S., Karr, A., Nardi, Y., Slavkovic, A.: Secure logistic regression with distributed databases. In: Proceedings of the 56th Session of the ISI, The Bulletin of the International Statistical Institute (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fienberg, S.E., Nardi, Y., Slavković, A.B. (2009). Valid Statistical Analysis for Logistic Regression with Multiple Sources. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds) Protecting Persons While Protecting the People. ISIPS 2008. Lecture Notes in Computer Science, vol 5661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10233-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-10233-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10232-5
Online ISBN: 978-3-642-10233-2
eBook Packages: Computer ScienceComputer Science (R0)