Skip to main content
Log in

Surrogate data – a secure way to share corporate data

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Summary

The privacy of chemical structure is of paramount importance for the industrial sector, in particular for the pharmaceutical industry. At the same time, companies handle large amounts of physico-chemical and biological data that could be shared in order to improve our molecular understanding of pharmacokinetic and toxicological properties, which could lead to improved predictivity and shorten the development time for drugs, in particular in the early phases of drug discovery. The current study provides some theoretical limits on the information required to produce reverse engineering of molecules from generated descriptors and demonstrates that the information content of molecules can be as low as less than one bit per atom. Thus theoretically just one descriptor can be used to completely disclose the molecular structure. Instead of sharing descriptors, we propose to share surrogate data. The sharing of surrogate data is nothing else but sharing of reliably predicted molecules. The use of surrogate data can provide the same information as the original set. We consider the practical application of this idea to predict lipophilicity of chemical compounds and we demonstrate that surrogate and real (original) data provides similar prediction ability. Thus, our proposed strategy makes it possible not only to share descriptors, but also complete collections of surrogate molecules without the danger of disclosing the underlying molecular structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. DiMasi J.A., Hansen R.W., Grabowski H.G., (2003) J. Health. Econ. 22: 151

    Article  Google Scholar 

  2. Landers, P., The Wall Street Journal, 12/8/2003, 2003

  3. Tetko I.V., Poda G.I., (2004) J. Med. Chem. 47: 5601

    Article  CAS  Google Scholar 

  4. Tetko I.V., Bruneau P., (2004) J. Pharm. Sci. 93: 3103

    Article  CAS  Google Scholar 

  5. Tetko, I.V., Drug Discov. Today, in press (2005)

  6. Irwin J.J., Shoichet B.K., (2005) J. Chem. Inf. Model. 45: 177

    Article  CAS  Google Scholar 

  7. Tetko I.V., Tanchuk V.Y., Villa A.E., (2001) J. Chem. Inf. Comput. Sci. 41: 1407

    Article  CAS  Google Scholar 

  8. Tetko I.V., (2002) Neur. Proc. Lett. 16: 187

    Article  Google Scholar 

  9. Tetko I.V., (2002) J. Chem. Inf. Comput. Sci. 42: 717

    Article  CAS  Google Scholar 

  10. Tetko I.V., Villa A.E.P., Aksenova T.I., Zielinski W.L., Brower J., Collantes E.R., Welsh W.J., (1998) J. Chem. Inf. Comput. Sci. 38: 660

    Article  CAS  Google Scholar 

  11. Hall L.H., Kier L.B., (1995) J. Chem. Inf. Comput. Sci. 35: 1039

    Article  CAS  Google Scholar 

  12. Kier L.B., Hall L.H., 1999. Molecular Structure Description: The Electrotopological State, Academic Press, London

    Google Scholar 

  13. Kier L.B., Hall L.H., (1990) Pharm. Res. 7: 801

    Article  CAS  Google Scholar 

  14. PHYSPROP database is available from Syracuse, Inc. http://www.syrres.com, 31/07/2005

  15. Sadowski J., Gasteiger J., Klebe G., (1994) J. Chem. Inf. Comput. Sci. 34: 1000

    Article  CAS  Google Scholar 

  16. Todeschini R., Consonni V., 2000. Handbook of Molecular Descriptors, WILEY-VCH, Weinheim

    Google Scholar 

  17. Weininger, D., Blaney, J.M. and Dixon, S., 1993 USA

  18. Clement, O.O. and Guner, O.F. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  19. Bologa, C., Olah, M. and Oprea, T.I. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  20. Shen, L., Smith, K.M., Masek, B.B. and Pearlman, R.S. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  21. Li M., Vitanyi P., 1997 An Introduction to Kolmogorov Complexity and Its Applications, Springer Verlag, Heidelberg

    Google Scholar 

  22. Filimonov, D. and Poroikov, V.V. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  23. Abagyan, R. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  24. Tetko, I.V. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  25. Oprea T.I., (2002) J. Braz. Chem. Soc. 13: 811

    Article  CAS  Google Scholar 

  26. Solov’ev V.P., Varnek A., Wipff G., (2000). J. Chem. Inf. Comput. Sci. 40: 847

    CAS  Google Scholar 

  27. Trepalin S.V., Gerasimenko V.A., Kozyukov A.V., Savchuk N.P., Ivaschenko A.A., (2002) J. Chem. Inf. Comput. Sci. 42: 249

    Article  CAS  Google Scholar 

  28. Mestres, J. and Gregori-Puigjané, E. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  29. http://www-groups.dcs.st-and.ac.uk/∼ ∼history/HistTopics/Fermat’s_last_theorem.html, 31/07/2005

  30. Young, S.S., Karr, A. and Sanil, A.P. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  31. Vapnik V.N., 1998 Statistical Leaning Theory, Wiley, New York

    Google Scholar 

  32. Walker M.J., (2004) QSAR Comb. Sci. 23: 515

    Article  CAS  Google Scholar 

  33. Kappler, M.A., Allu, T.K. and Oprea, T.I. J. Chem. Inf. Model., (2005) in preparation

  34. Wilson E.K., (2005) Chem. Eng. News 83: 24

    Google Scholar 

Download references

Acknowledgement

The authors thank Scott Hutton for providing compounds from iResearch library (ChemNavigator) used in the current study, Cristian Bologa (University of New Mexico Division of Biocomputing) and Philip Wong (Institute for Bioinformatics) for their technical help. The authors thank Robert S. Pearlman for his constructive comments. Part of this work was supported by INTAS “Virtual Computational Chemistry Laboratory” http://www.vcclab.org grant (IVT) and by New Mexico Tobacco Settlement Funds for Biocomputing (TIO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor V. Tetko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tetko, I.V., Abagyan, R. & Oprea, T.I. Surrogate data – a secure way to share corporate data. J Comput Aided Mol Des 19, 749–764 (2005). https://doi.org/10.1007/s10822-005-9013-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-005-9013-3

Key words:

Navigation