Skip to main content

Advertisement

Log in

Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

The use of inaccurate scoring functions in docking algorithms may result in the selection of compounds with high predicted binding affinity that nevertheless are known experimentally not to bind to the target receptor. Such falsely predicted binders have been termed ‘binding decoys’. We posed a question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors using approaches commonly used in ligand based drug design. We have applied the k-Nearest Neighbor (kNN) classification QSAR approach to a dataset of compounds characterized as binders or binding decoys of AmpC beta-lactamase. Models were subjected to rigorous internal and external validation as part of our standard workflow and a special QSAR modeling scheme was employed that took into account the imbalanced ratio of inhibitors to non-binders (1:4) in this dataset. 342 predictive models were obtained with correct classification rate (CCR) for both training and test sets as high as 0.90 or higher. The prediction accuracy was as high as 100% (CCR = 1.00) for the external validation set composed of 10 compounds (5 true binders and 5 decoys) selected randomly from the original dataset. For an additional external set of 50 known non-binders, we have achieved the CCR of 0.87 using very conservative model applicability domain threshold. The validated binary kNN QSAR models were further employed for mining the NCGC AmpC screening dataset (69653 compounds). The consensus prediction of 64 compounds identified as screening hits in the AmpC PubChem assay disagreed with their annotation in PubChem but was in agreement with the results of secondary assays. At the same time, 15 compounds were identified as potential binders contrary to their annotation in PubChem. Five of them were tested experimentally and showed inhibitory activities in millimolar range with the highest binding constant Ki of 135 μM. Our studies suggest that validated QSAR models could complement structure based docking and scoring approaches in identifying promising hits by virtual screening of molecular libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sharff A, Jhoti H (2003) High-throughput crystallography to enhance drug discovery. Curr Opin Chem Biol 7:340–345

    Article  CAS  Google Scholar 

  2. Blundell TL, Jhoti H, Abell C (2002) High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov 1:45–54

    Article  CAS  Google Scholar 

  3. RCSB. PDB. http://www.rcsb.org/. Accessed 2007

  4. Dessalew N, Bharatam PV (2007) Identification of potential glycogen kinase-3 inhibitors by structure based virtual screening. Biophys Chem 128:165–175

    Article  CAS  Google Scholar 

  5. Lu IL, Huang CF, Peng YH, Lin YT, Hsieh HP, Chen CT et al (2006) Structure-based drug design of a novel family of PPAR gamma partial agonists: virtual screening, X-ray crystallography, and in vitro/in vivo biological activities. J Med Chem 49:2703–2712

    Article  CAS  Google Scholar 

  6. Zhou Y, Peng H, Ji Q, Qi J, Zhu Z, Yang C (2006) Discovery of small molecule inhibitors of integrin alphavbeta3 through structure-based virtual screening. Bioorg Med Chem Lett 16:5878–5882

    Article  CAS  Google Scholar 

  7. Du L, Li M, You Q, Xia L (2007) A novel structure-based virtual screening model for the hERG channel blockers. Biochem Biophys Res Commun 355:889–894

    Article  CAS  Google Scholar 

  8. Kellenberger E, Springael JY, Parmentier M, Hachet-Haas M, Galzi JL, Rognan D (2007) Identification of nonpeptide CCR5 receptor agonists by structure-based virtual screening. J Med Chem 50:1294–1303

    Article  CAS  Google Scholar 

  9. Zhao L, Brinton RD (2005) Structure-based virtual screening for plant-based ERbeta-selective ligands as potential preventative therapy against age-related neurodegenerative diseases. J Med Chem 48:3463–3466

    Article  CAS  Google Scholar 

  10. Evers A, Klabunde T (2005) Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1A adrenergic receptor. J Med Chem 48:1088–1097

    Article  CAS  Google Scholar 

  11. Oh M, Im I, Lee YJ, Kim YH, Yoon JH, Park HG et al (2004) Structure-based virtual screening and biological evaluation of potent and selective ADAM12 inhibitors. Bioorg Med Chem Lett 14:6071–6074

    Article  CAS  Google Scholar 

  12. Christmann-Franck S, Bertrand HO, Goupil-Lamy A, der Garabedian PA, Mauffret O, Hoffmann R et al (2004) Structure-based virtual screening: an application to human topoisomerase II alpha. J Med Chem 47:6840–6853

    Article  CAS  Google Scholar 

  13. Kim YG, Thai KM, Song J, Kim KK, Park HJ (2007) Identification of novel ligands for the Z-DNA binding protein by structure-based virtual screening. Chem Pharm Bull (Tokyo) 55:340–342

    Article  CAS  Google Scholar 

  14. Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH et al (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49:5912–5931

    Article  CAS  Google Scholar 

  15. Graves AP, Brenk R, Shoichet BK (2005) Decoys for docking. J Med Chem 48:3714–3728

    Article  CAS  Google Scholar 

  16. Chen H, Lyne PD, Giordanetto F, Lovell T, Li J (2006) On evaluating molecular-docking methods for pose prediction and enrichment factors. J Chem Inf Model 46:401–415

    Article  CAS  Google Scholar 

  17. Park H, Lee J, Lee S (2006) Critical assessment of the automated AutoDock as a new docking tool for virtual screening. Proteins 65:549–554

    Article  CAS  Google Scholar 

  18. Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46:2287–2303

    Article  CAS  Google Scholar 

  19. Zsoldos Z, Reid D, Simon A, Sadjad BS, Johnson AP (2006) eHiTS: an innovative approach to the docking and scoring function problems. Curr Protein Pept Sci 7:421–435

    Article  CAS  Google Scholar 

  20. Clark RD, Strizhev A, Leonard JM, Blake JF, Matthew JB (2002) Consensus scoring for ligand/protein interactions. J Mol Graph Model 20:281–295

    Article  CAS  Google Scholar 

  21. Charifson PS, Corkery JJ, Murcko MA, Walters WP (1999) Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J Med Chem 42:5100–5109

    Article  CAS  Google Scholar 

  22. Wang R, Wang S (2001) How does consensus scoring work for virtual library screening? An idealized computer experiment. J Chem Inf Comput Sci 41:1422–1426

    Article  CAS  Google Scholar 

  23. Yang JM, Chen YF, Shen TW, Kristal BS, Hsu DF (2005) Consensus scoring criteria for improving enrichment in virtual screening. J Chem Inf Model 45:1134–1146

    Article  CAS  Google Scholar 

  24. Powers RA, Morandi F, Shoichet BK (2002) Structure-based discovery of a novel, noncovalent inhibitor of AmpC beta-lactamase. Structure 10:1013–1023

    Article  CAS  Google Scholar 

  25. Tropsha A, Golbraikh A (2007) Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr Pharm Des 13:3494–3504

    Article  CAS  Google Scholar 

  26. Tropsha A (2005) Application of predictive QSAR models to database mining. In: Oprea T (ed) Cheminformatics in drug discovery. Wiley-VCH, pp 437–455

  27. Medina-Franco JL, Golbraikh A, Oloff S, Castillo R, Tropsha A (2005) Quantitative structure-activity relationship analysis of pyridinone HIV-1 reverse transcriptase inhibitors using the k nearest neighbor method and QSAR-based database mining. J Comput Aided Mol Des 19:229–242

    Article  CAS  Google Scholar 

  28. de Cerqueira LP, Golbraikh A, Oloff S, Xiao Y, Tropsha A (2006) Combinatorial QSAR modeling of P-glycoprotein substrates. J Chem Inf Model 46:1245–1254

    Article  Google Scholar 

  29. Oloff S, Mailman RB, Tropsha A (2005) Application of validated QSAR models of D1 dopaminergic antagonists for database mining. J Med Chem 48:7322–7332

    Article  CAS  Google Scholar 

  30. Shen M, Beguin C, Golbraikh A, Stables JP, Kohn H, Tropsha A (2004) Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds. J Med Chem 47:2356–2364

    Article  CAS  Google Scholar 

  31. Kovatcheva A, Golbraikh A, Oloff S, Xiao YD, Zheng W, Wolschann P et al (2004) Combinatorial QSAR of ambergris fragrance compounds. J Chem Inf Comput Sci 44:582–595

    Article  CAS  Google Scholar 

  32. NCBI. PubChem. http://pubchem.ncbi.nlm.nih.gov/. Accessed 2007

  33. Shoichet BK. Dr. Brian Shoichet Take-away Webpage. http://shoichetlab.compbio.ucsf.edu/take-away.php. Accessed 2007

  34. Tondi D, Morandi F, Bonnet R, Costi MP, Shoichet BK (2005) Structure-based optimization of a non-beta-lactam lead results in inhibitors that do not up-regulate beta-lactamase expression in cell culture. J Am Chem Soc 127:4632–4639

    Article  CAS  Google Scholar 

  35. Feng BY, Simeonov A, Jadhav A, Babaoglu K, Inglese J, Shoichet BK et al (2007) A high-throughput screen for aggregation-based inhibition in a large compound library. J Med Chem 50:2385–2390

    Article  CAS  Google Scholar 

  36. PubChem. PubChem Bioassay AID 584. http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=584. Accessed 2007

  37. PubChem. PubChem Bioassay AID 585. http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=585. Accessed 2007

  38. Feng BY, Shelat A, Doman TN, Guy RK, Shoichet BK (2005) High-throughput assays for promiscuous inhibitors. Nat Chem Biol 1:146–148

    Article  CAS  Google Scholar 

  39. Golbraikh A, Tropsha A (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. Mol Divers 5:231–243

    Article  Google Scholar 

  40. Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A (2003) Rational selection of training and test sets for the development of validated QSAR models. J Comput Aided Mol Des 17:241–253

    Article  CAS  Google Scholar 

  41. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36

    Article  CAS  Google Scholar 

  42. Sybyl 7.2. (2007) Tripos, Inc.

  43. MolconnZ 4.09. (2007) eduSoft, LC.

  44. Kier LB, Hall LH (1976) Molecular connectivity in chemistry and drug research. Academic Press, New York

    Google Scholar 

  45. Kier LB, Hall LH (1986) Molecular connectivity in structure-activity analysis. Wiley, New York

    Google Scholar 

  46. Randi M (1975) On characterization on molecular branching. J Am Chem Soc 97:6609–6615

    Article  Google Scholar 

  47. Kier LB (1985) A shape index from molecular graphs. Quant Struct-Act Relat 4:109–116

    Article  CAS  Google Scholar 

  48. Kier LB (1987) Inclusion of symmetry as a shape attribute in kappa-index analysis. Quant Struct-Act Relat 6:8–12

    Article  CAS  Google Scholar 

  49. Kier LB, Hall LH (1990) An electrotopological state index for atoms in molecules. Pharm Res 7:801

    Article  CAS  Google Scholar 

  50. Kier LB, Hall LH (1991) An Index of Electrotopological State of Atoms in Molecules. J Math Chem 7:229

    Article  CAS  Google Scholar 

  51. Kier LB, Hall LH (1999) Molecular structure description: the electrotopological state. Academic Press

  52. Petitjean M (1992) Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds. J Chem Inf Comput Sci 32:331–337

    CAS  Google Scholar 

  53. Zheng W, Tropsha A (2000) Novel variable selection quantitative structure–property relationship approach based on the k-nearest-neighbor principle. J Chem Inf Comput Sci 40:185–194

    Article  CAS  Google Scholar 

  54. Tropsha A (2003): Recent trends in quantitative structure-activity relationships. In: Abraham D (ed) Burger’s medicinal chemistry and drug discovery. Wiley, New York, pp. 49–77

    Google Scholar 

  55. Itskowitz P, Tropsha A (2005) kappa Nearest neighbors QSAR modeling as a variational problem: theory and applications. J Chem Inf Model 45:777–785

    Article  CAS  Google Scholar 

  56. Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Qsar Comb Sci 22:69–77

    Article  CAS  Google Scholar 

  57. Wold S, Eriksson L (1995): Statistical validation of QSAR results. In: Waterbeemd Hvd (ed) Chemometrics methods in molecular design (Methods and principles in medicinal chemistry, Vol 2). Wiley-VCH Verlag GmbH, Weinheim (Germany), pp 309–318

  58. PubChem. Structural Clustering. http://pubchem.ncbi.nlm.nih.gov/assay/assaycluster.cgi. Accessed 2007

  59. Jorgensen WL, Tirado-Rives J (2006) QSAR/QSPR and proprietary data. J Chem Inf Model 46:937

    Article  CAS  Google Scholar 

  60. Golbraikh A, Tropsha A (2002) Beware of q(2)!. J Mol Graph Model 20:269–276

    Article  CAS  Google Scholar 

  61. Oprea TI, Tropsha A, Faulon JL, Rintoul MD (2007) Systems chemical biology. Nat Chem Biol 3:447–450

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank Drs. Brian Shoichet and John Irwin for providing the AmpC dataset and fruitful discussions. We also acknowledge the access to the computing facilities at the ITS Research Computing Division of the University of North Carolina at Chapel Hill. The studies reported in this paper were supported in part by the NIH research grant GM066940 and the RoadMap Center planning grant P20-HG003898. Denise Teotico was supported by NIH grants GM71630 and GM59957.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Tropsha.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 91 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsieh, JH., Wang, X.S., Teotico, D. et al. Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening. J Comput Aided Mol Des 22, 593–609 (2008). https://doi.org/10.1007/s10822-008-9199-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-008-9199-2

Keywords

Navigation