Skip to main content
Log in

Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Summary

We examined “descriptor collision” for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the “descriptor collision” rate (here termed “descriptor confusion”), in order to design a set of “descriptors to mask chemical structures”, DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the “confusion” rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated PLS engine, WB-PLS [Olah et al., J. Comput. Aided Mol. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The “reduced set” of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

CMR:

calculated molecular refractivity

ClogP:

program produced by BioByte Corp., Claremont, CA

Daylight/DY:

Daylight Chemical Information Systems

DMCS:

descriptors to mask chemical structures

DMSO:

Dimethylsulfoxide

DPISMR:

the NIH Small Molecule Repository as organized by DPI

LogP:

the logarithm of the octanol-water partition coefficient

LogSw :

the logarithm of the (molar) aqueous solubility

MACCS:

Molecular ACCess System, an MDL product

MDL:

Molecular Design Limited

MLI:

Molecular Libraries and Imaging initiative

NIH:

National Institutes of Health

PLS:

Partial Least Squares/Projection Latent Structures

QSAR:

quantitative structure–activity relationships

SMDL:

Sunset Molecular Discovery, LLC

SMILES:

Simplified Molecular Input Line Entry Specification

WOMBAT/WB:

WOrld of Molecular BioAcTivity database.

References

  1. Austin, C.P., Brady, L.S., Insel, T.R. and Collins, F.S., Science, 306 (2004) 1138. Last access on 21.10.05

    Article  CAS  Google Scholar 

  2. The PubChem database is available online at the National Center for Biotechnology Information, http://pubchem.ncbi.nlm.nih.gov/ Last access on 21.10.05

  3. Hahn, M.M. and Green, R., Curr. Opin. Chem. Biol., 3 (1999) 379.

    Article  Google Scholar 

  4. Filimonov, D. and Poroikov, V., J. Comput. Aided Mol. Des., 19 (2005) in press

  5. Weber, L., Curr. Opin. Chem. Biol., 2 (1998) 381.

    Article  CAS  Google Scholar 

  6. The iResearch Library™ is available from ChemNavigator, Inc., http://chemnavigator.com/cnc/products/IRL.asp Last access on 21.10.05

  7. The Crossfire Beilstein database is available from Elsevier MDL, http://www.mdl.com/products/knowledge/crossfire_beilstein/index.jsp Last access on 21.10.05

  8. Tetko, I.V., Abagyan, R. and Oprea, T.I., J. Comput. Aided Mol. Des., 19 (2005) in press

  9. Faulon, J.L., Brown, W.M. and Martin, S., J. Comput. Aided Mol. Des., 19 (2005) in press

  10. Olah, M., Mracec, M., Ostopovici, L., Rad, R., Bora, A., Hadaruga, N., Olah, I., Banda, M., Simon, S., Mracec, M. and Oprea, T.I., In Oprea, T.I. (Ed), Chemoinformatics in Drug Discovery, Wiley-VCH, New York, 2005, pp. 223–239

  11. WOMBAT is available from Sunset Molecular Discovery LLC, http://www.sunsetmolecular.com/ Last access on 21.10.05

  12. Weininger, D., J. Chem. Inf. Comput. Sci., 28 (1988) 31.

    Article  CAS  Google Scholar 

  13. Leo, A. and Weininger, D., CMR3. Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 1995

  14. Leo, A., Chem. Rev., 93 (1993) 1281.

    Article  CAS  Google Scholar 

  15. Leo, A. and Weininger, D., CLOGP 4.0. Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 2001

  16. Ran, Y., Jain, N. and Yalkowsky, S.H., J. Chem. Inf. Comput. Sci., 41 (2001) 1208.

    Article  CAS  Google Scholar 

  17. Livingstone, D.J., Ford, M.G., Huuskonen, J.J. and Salt, D.W., J. Comput. Aided Mol. Des., 15 (2001) 741.

    Article  CAS  Google Scholar 

  18. Tetko, I.V., Tanchuk, V.Y. and Villa, A.E., J. Chem. Inf. Comput. Sci., 41 (2001) 1407.

    Article  CAS  Google Scholar 

  19. Glen, R.C., J. Comput. Aided Mol. Des., 8 (1994) 457.

    Article  CAS  Google Scholar 

  20. Gasteiger, J. and Marsili, M., Tetrahedron, 36 (1980) 3219.

    Article  CAS  Google Scholar 

  21. Oprea, T.I., J. Comput. Aided Mol. Des., 14 (2000) 251.

    Article  CAS  Google Scholar 

  22. Balaban, A.T., SAR QSAR Environ. Res., 8 (1998) 1.

    Article  CAS  Google Scholar 

  23. Kier, L.B. and Hall, L.H. Molecular Connectivity in Structure-Activity Analysis. John Wiley, New York, 1986.

    Google Scholar 

  24. Basak, S.C., Balaban, A.T., Grunwald, G.D. and Gute, B.D., J. Chem. Inf. Comput. Sci., 40 (2000) 891.

    Article  CAS  Google Scholar 

  25. Durant, J.L., Leland, B.A., Henry, D.R. and Nourse, J.G., J. Chem. Inf. Comput. Sci., 42 (2002) 1273.

    Article  CAS  Google Scholar 

  26. MacCuish, J. and MacCuish, N., Measures software, Mesa Analytics and Computing LLC, Santa Fe, New Mexico, http://www.mesaac.com/ Last access on 21.10.05

  27. Daylight fingerprints are available from Daylight Chemical Information Systems, http://www.daylight.com/ Last access on 21.10.05

  28. Olah, M., Bologa, C. and Oprea, T.I., J. Comput. Aided Mol. Des., 18 (2004) 437.

    Article  CAS  Google Scholar 

  29. Schneider, G., Neidhart, W., Giller, T. and Schmidt, G., Angew. Chem. Int. Ed., 38 (1999) 2894.

    Article  CAS  Google Scholar 

  30. The SMARTS toolkit and SMARTS are available from Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/dayhtml/doc/theory.smarts.html; online SMARTS tutorial: http://www.daylight.com/dayhtml/doc/theory/smarts.html, 2005

  31. SMACK and OEChem are available from OpenEye Scientific Software, Santa Fe, New Mexico, http://www.eyesopen.com/products/applications/smack.html, 2005

  32. Wold, S., Johansson, E. and Cocchi, M., In Kubinyi, H., (Ed), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993, pp. 523–550

  33. Kappler, M.A., Allu, T.K., Bologa, C. and Oprea, T.I., J. Chem. Inf. Model, 45 (2005) in preparation

Download references

Acknowledgments

We thank Jeremy (JJ) Yang from OpenEye Scientific Software (Santa Fe, NM) for advice on descriptor collision. This work was supported by New Mexico Tobacco Settlement Funds for Biocomputing (TKA, MO) and by the New Mexico Molecular Library Screening Center, NIH 1U54 MH074425-01 (CB, TIO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tudor I. Oprea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bologa, C., Allu, T.K., Olah, M. et al. Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures. J Comput Aided Mol Des 19, 625–635 (2005). https://doi.org/10.1007/s10822-005-9020-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-005-9020-4

Keywords

Navigation