Summary
We present a method for performing statistically valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multi-party computation to share local sufficient statistics necessary to compute least squares estimators of regression coefficients, error variances and other quantities of interest. We illustrate our method with an example containing four companies’ rather different databases.
Similar content being viewed by others
References
Goldwasser, S., Multi-Party Computations: Past and Present. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing, ACM Press, New York, 1997, pp. 1–6
Yao, A.C., Protocols for secure computations. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, ACM Press, New York, 1982, pp. 160–164
Karr, A.F., Lin, X., Reiter, J.P. and Sanil, A.P., J. Comput. Graph. Stat., (2004b). To appear. Available on-line at www.niss.org/dgii/technicalreports.html
Karr, A.F., Lin, X., Reiter, J.P. and Sanil, A.P., Secure analysis of distributed databases. ASA/SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia, 2005a. To appear. Available on-line at www.niss.org/dgii/technicalreports.html
Reiter J.P., (2003). Stat. Comput. 13:371
Huuskonen J., (2000). J. Chem. Inf. Comput. Sci. 40:773
Wang R., Gao Y., Lai L., (2000). Perspect Drug Discov Design 19:47
Liu K., Feng J., Young S.S., (2005). J. Chem. Inf. Model. 45(2):515
SAS Institute, Inc. JMP, the Statistical Discovery Software, 2005. Information available on-line at www.jmp.com
Willenborg L.C.R.J., de Waal T., (2001), Elements of Statistical Disclosure Control. Springer-Verlag, New York
Powell M.J.D., (1964). Comput. J. 7:152
Acknowledgements
This research was supported by NSF Grant EIA-0131884 to the National Institute of Statistical Sciences (NISS) and by the HighQ Foundation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. The data and structures used in this paper are available at www.niss.org/PowerMV.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Karr, A.F., Feng, J., Lin, X. et al. Secure analysis of distributed chemical databases without data integration. J Comput Aided Mol Des 19, 739–747 (2005). https://doi.org/10.1007/s10822-005-9011-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-005-9011-5