Skip to main content
Log in

An automated PLS search for biologically relevant QSAR descriptors

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

An automated PLS engine, WB-PLS, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-PLS extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without scaling. Cross-validation, variable importance on projections (VIP) above 0.8 and q 2⩾0.3 were applied for model significance. Among cross-validation methods, leave-one-in-seven-out (CV7) is a better measure of model significance, compared to leave-one-out (measuring redundancy) and leave-half-out (too restrictive). SMARTS counts overlap with 2D descriptors (having a more quantitative nature), whereas MDL keys overlap with in-house fingerprints (both are more qualitative). The SMARTS counts is the most effective descriptor system, when compared to the other three. At the individual level, size-related descriptors and topological indices (in the 2D property space), and branched SMARTS, aromatic and ring atom types and halogens are found to be most relevant according to the VIP criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • C. Hansch T. Fujita (1964) J. Am. Chem. Soc., 86 1616

    Google Scholar 

  • S.M. Free SuffixJr. J.W. Wilson (1964) J. Med. Chem., 7 395

    Google Scholar 

  • R. Todeschini V. Consonni (2000) Handbook of Molecular Descriptors Wiley-VCH Weinheim

    Google Scholar 

  • C. Hansch A. Leo (1995) Exploring QSAR. Fundamentals and Applications in Chemistry and Biology ACS Publishers Washington, DC

    Google Scholar 

  • D.J. Livingstone (2000) J. Chem. Inf. Comput. Sci., 40 195

    Google Scholar 

  • Kubinyi, H., unpublished results.

  • Leo, A. and Weininger, D., CMR3. Daylight Chemical Information Systems, Santa Fe, New Mexico, htttp://www.daylight.com/, 1995.

  • A. Leo (1993) Chem. Rev., 5 1281

    Google Scholar 

  • Leo, A. and Weininger, D., CLOGP 4.0. Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 2001.

  • http://www.qsar.org/resource/software/htm, accessed in June 2002.

  • Y. Ran N. Jain S.H. Yalkowsky (2001) J. Chem. Inf. Comput. Sci., 41 1208

    Google Scholar 

  • D.J. Livingstone M.G. Ford J.J. Huuskonen D.W. Salt (2001) J. Comput.-Aided Mol. Design 15 741

    Google Scholar 

  • R.C. Glen (1994) J. Comput.-Aided Mol. Design 8 457

    Google Scholar 

  • J. Hinze H.H. Jaffe (1962) J. Am. Chem. Soc., 84 540

    Google Scholar 

  • J. Hinze M.A. Whitehead H.H. Jaffe (1963) J. Am. Chem. Soc., 85 148

    Google Scholar 

  • J. Gasteiger M. Marsili (1980) Tetrahedron 36 3219

    Google Scholar 

  • Hansch et al. (2003) J. Chem. Inf. Comput. Sci., 43 120

    Google Scholar 

  • O.A. Raevsky V.Yu. Grigor’ev D. Kireev N.S. Zefirov (1992) Quant. Struct.-Act. Relat., 11 49

    Google Scholar 

  • HYBOT. TimTec Inc., Moscow, Russia, http://www.timtec.net/software/hybotplus.htm, 1998.

  • A.M. Zissimos M.H. Abraham M.C. Barker K.J. Box K.Y. Tam (2002) J. Chem. Soc. Perkin 2 3 470

    Google Scholar 

  • L.B. Kier L.H. Hall (1999) Molecular Structure Description: The Electrotopological State Academic Press New York

    Google Scholar 

  • T.I. Oprea (2000) J. Comput.-Aided Mol. Design 14 251

    Google Scholar 

  • A.T. Balaban (1998) SAR QSAR Environ. Res., 8 1

    Google Scholar 

  • L.B. Kier L.H. Hall (1986) Molecular Connectivity in Structure-Activity Analysis John Wiley New York

    Google Scholar 

  • An analysis [26] using over 200 topological indices on over 1000 diverse structures revealed that these descriptors are grouped in 18 clusters that can be related to size, bond information, and molecular complexity (among other properties).

  • S.C. Basak A.T. Balaban G.D. Grunwald B.D. Gute (2000) J. Chem. Inf. Comput. Sci., 40 891

    Google Scholar 

  • R.D. Cramer SuffixIII D.E. Patterson J.D. Bunce (1988) J. Am. Chem. Soc., 110 5959

    Google Scholar 

  • P.J. Goodford (1985) J. Med. Chem., 28 849

    Google Scholar 

  • Wold, S., Johansson, E. and Cocchi, M., In Kubinyi, H. (Ed), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993, pp. 523-550.

  • H. Kubinyi (Eds) (1993) 3D QSAR in Drug Design: Theory Methods and Applications ESCOM Leiden

    Google Scholar 

  • H. Kubinyi G. Folkers Y.C. Martin (1998) 3D QSAR in Drug Design, Vol. 2. Ligand Protein Interactions and Molecular Similarity Kluwer/ESCOM Dordrecht

    Google Scholar 

  • H. Kubinyi G. Folkers Y.C. Martin (1998) 3D QSAR in Drug Design, Vol. 3. Recent Advances Kluwer/ESCOM Dordrecht

    Google Scholar 

  • Cramer III, R.D. and Wold, S.B., US pat. 5025388 (1991). (CAN 115:135113).

  • S.H. Unger C. Hansch (1973) J. Med. Chem., 16 745

    Google Scholar 

  • D.C. Whitley M.G. Ford D.J. Livingstone (2000) J. Chem. Inf. Comput. Sci., 40 1160

    Google Scholar 

  • M.M.C. Ferreira C.A. Montanari A.C. Gaudio (2002) Quimica Nova 25 439

    Google Scholar 

  • O. Nicolotti V.J. Gillet P.J. Fleming D.V.S. Green (2002) J. Med. Chem., 45 5069

    Google Scholar 

  • A. Golbraikh M. Shen Z. Xiao Y.-D. Xiao K.-H. Lee A. Tropsha (2003) J. Comput.-Aided Mol. Design, 17 241

    Google Scholar 

  • D. Weininger (1988) J. Chem. Inf. Comput. Sci., 28 31

    Google Scholar 

  • WB-PLS 1.0, developed at Sunset Molecular Discovery LLC, Santa Fe, New Mexico, http://www.sunsetmolecular.com/, 2004.

  • WOMBAT database, Sunset Molecular Discovery LLC, Santa Fe, New Mexico, http://www.sunsetmolecular.com/, 2004.

  • J.L. Durant B.A. Leland D.R. Henry J.G. Nourse (2002) J. Chem. Inf. Comput. Sci., 42 1273

    Google Scholar 

  • SMARTS, Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/dayhtml/doc/theory.smarts.html; online SMARTS tutorial: http://www.daylight.com/dayhtml/doc/theory.smarts. html, 2004.

  • C.A. Lipinski F. Lombardo B.W. Dominy P.J. Feeney (1997) Adv. Drug Delivery Rev., 23 3

    Google Scholar 

  • MacCuish J. and MacCuish N., Measures Software, Mesa Analytics and Computing LLC, Santa Fe, New Mexico.

  • G. Schneider W. Neidhart T. Giller G. Schmidt (1999) Angew. Chem. Int. Ed. Engl. 38 2894

    Google Scholar 

  • E. Byvatov U. Fechner J. Sadowski G. Schneider (2003) J. Chem. Inf. Comput. Sci., 43 1882

    Google Scholar 

  • Daylight Toolkit v4.81, Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 2003.

  • OEChem v1.2, Openeye Scientific Software, Santa Fe, New Mexico, http://www.eyesopen.com/, 2004.

  • S. Wold A. Ruhe H. Wold W.J. Dunn SuffixIII (1984) SIAM J. Sci. Stat. Comput., 5 735

    Google Scholar 

  • J. Trygg (2001) Parsimonious Multivariate Models Umetrics Academy Umeå

    Google Scholar 

  • A. Höskuldsson (1998) J. Chemometr., 2 211

    Google Scholar 

  • R.D. Cramer J.D. Bunce D.E. Patterson I.E. Frank (1988) Quant. Struct.-Act. Relat., 7 18

    Google Scholar 

  • S. Wold (1978) Technometrics 20 397

    Google Scholar 

  • Statistical parameters are described in the SIMCA user manual; the software is available from Umetrics, Umeå, Sweden, web site: http://www.umetrics.com/.

  • L. Eriksson E. Johansson N. Kettaneh-Wold S. Wold (2001) Multi- and Megavariate Data Analysis. Principles and Applications Umetrics Academy Umeå

    Google Scholar 

  • E. Zhu R.M. Barnes (1995) J. Chemometr. 9 363

    Google Scholar 

  • These figures are available from the authors upon request.

  • T.I. Oprea J. Gottfries (2001) J. Comb. Chem., 3 157

    Google Scholar 

  • T.I. Oprea (2002) J. Braz. Chem. Soc., 13 811

    Google Scholar 

  • C. Hansch D. Hoekman A. Leo D. Weininger C.D. Selassie (2002) Chem. Rev., 102 783

    Google Scholar 

  • By default, for cross-validation the SIMCA-P software divides the original data into 7 groups; see the user manual or the document http://www.umetrics.com/download/KB/Multivariate%20FAQ.pdf, 2004.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tudor I. Oprea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olah, M., Bologa, C. & Oprea, T.I. An automated PLS search for biologically relevant QSAR descriptors. J Comput Aided Mol Des 18, 437–449 (2004). https://doi.org/10.1007/s10822-004-4060-8

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-004-4060-8

Keywords

Navigation