Abstract
The use of inaccurate scoring functions in docking algorithms may result in the selection of compounds with high predicted binding affinity that nevertheless are known experimentally not to bind to the target receptor. Such falsely predicted binders have been termed ‘binding decoys’. We posed a question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors using approaches commonly used in ligand based drug design. We have applied the k-Nearest Neighbor (kNN) classification QSAR approach to a dataset of compounds characterized as binders or binding decoys of AmpC beta-lactamase. Models were subjected to rigorous internal and external validation as part of our standard workflow and a special QSAR modeling scheme was employed that took into account the imbalanced ratio of inhibitors to non-binders (1:4) in this dataset. 342 predictive models were obtained with correct classification rate (CCR) for both training and test sets as high as 0.90 or higher. The prediction accuracy was as high as 100% (CCR = 1.00) for the external validation set composed of 10 compounds (5 true binders and 5 decoys) selected randomly from the original dataset. For an additional external set of 50 known non-binders, we have achieved the CCR of 0.87 using very conservative model applicability domain threshold. The validated binary kNN QSAR models were further employed for mining the NCGC AmpC screening dataset (69653 compounds). The consensus prediction of 64 compounds identified as screening hits in the AmpC PubChem assay disagreed with their annotation in PubChem but was in agreement with the results of secondary assays. At the same time, 15 compounds were identified as potential binders contrary to their annotation in PubChem. Five of them were tested experimentally and showed inhibitory activities in millimolar range with the highest binding constant Ki of 135 μM. Our studies suggest that validated QSAR models could complement structure based docking and scoring approaches in identifying promising hits by virtual screening of molecular libraries.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10822-008-9199-2/MediaObjects/10822_2008_9199_Fig7_HTML.gif)
Similar content being viewed by others
References
Sharff A, Jhoti H (2003) High-throughput crystallography to enhance drug discovery. Curr Opin Chem Biol 7:340–345
Blundell TL, Jhoti H, Abell C (2002) High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov 1:45–54
RCSB. PDB. http://www.rcsb.org/. Accessed 2007
Dessalew N, Bharatam PV (2007) Identification of potential glycogen kinase-3 inhibitors by structure based virtual screening. Biophys Chem 128:165–175
Lu IL, Huang CF, Peng YH, Lin YT, Hsieh HP, Chen CT et al (2006) Structure-based drug design of a novel family of PPAR gamma partial agonists: virtual screening, X-ray crystallography, and in vitro/in vivo biological activities. J Med Chem 49:2703–2712
Zhou Y, Peng H, Ji Q, Qi J, Zhu Z, Yang C (2006) Discovery of small molecule inhibitors of integrin alphavbeta3 through structure-based virtual screening. Bioorg Med Chem Lett 16:5878–5882
Du L, Li M, You Q, Xia L (2007) A novel structure-based virtual screening model for the hERG channel blockers. Biochem Biophys Res Commun 355:889–894
Kellenberger E, Springael JY, Parmentier M, Hachet-Haas M, Galzi JL, Rognan D (2007) Identification of nonpeptide CCR5 receptor agonists by structure-based virtual screening. J Med Chem 50:1294–1303
Zhao L, Brinton RD (2005) Structure-based virtual screening for plant-based ERbeta-selective ligands as potential preventative therapy against age-related neurodegenerative diseases. J Med Chem 48:3463–3466
Evers A, Klabunde T (2005) Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1A adrenergic receptor. J Med Chem 48:1088–1097
Oh M, Im I, Lee YJ, Kim YH, Yoon JH, Park HG et al (2004) Structure-based virtual screening and biological evaluation of potent and selective ADAM12 inhibitors. Bioorg Med Chem Lett 14:6071–6074
Christmann-Franck S, Bertrand HO, Goupil-Lamy A, der Garabedian PA, Mauffret O, Hoffmann R et al (2004) Structure-based virtual screening: an application to human topoisomerase II alpha. J Med Chem 47:6840–6853
Kim YG, Thai KM, Song J, Kim KK, Park HJ (2007) Identification of novel ligands for the Z-DNA binding protein by structure-based virtual screening. Chem Pharm Bull (Tokyo) 55:340–342
Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH et al (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49:5912–5931
Graves AP, Brenk R, Shoichet BK (2005) Decoys for docking. J Med Chem 48:3714–3728
Chen H, Lyne PD, Giordanetto F, Lovell T, Li J (2006) On evaluating molecular-docking methods for pose prediction and enrichment factors. J Chem Inf Model 46:401–415
Park H, Lee J, Lee S (2006) Critical assessment of the automated AutoDock as a new docking tool for virtual screening. Proteins 65:549–554
Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46:2287–2303
Zsoldos Z, Reid D, Simon A, Sadjad BS, Johnson AP (2006) eHiTS: an innovative approach to the docking and scoring function problems. Curr Protein Pept Sci 7:421–435
Clark RD, Strizhev A, Leonard JM, Blake JF, Matthew JB (2002) Consensus scoring for ligand/protein interactions. J Mol Graph Model 20:281–295
Charifson PS, Corkery JJ, Murcko MA, Walters WP (1999) Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J Med Chem 42:5100–5109
Wang R, Wang S (2001) How does consensus scoring work for virtual library screening? An idealized computer experiment. J Chem Inf Comput Sci 41:1422–1426
Yang JM, Chen YF, Shen TW, Kristal BS, Hsu DF (2005) Consensus scoring criteria for improving enrichment in virtual screening. J Chem Inf Model 45:1134–1146
Powers RA, Morandi F, Shoichet BK (2002) Structure-based discovery of a novel, noncovalent inhibitor of AmpC beta-lactamase. Structure 10:1013–1023
Tropsha A, Golbraikh A (2007) Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr Pharm Des 13:3494–3504
Tropsha A (2005) Application of predictive QSAR models to database mining. In: Oprea T (ed) Cheminformatics in drug discovery. Wiley-VCH, pp 437–455
Medina-Franco JL, Golbraikh A, Oloff S, Castillo R, Tropsha A (2005) Quantitative structure-activity relationship analysis of pyridinone HIV-1 reverse transcriptase inhibitors using the k nearest neighbor method and QSAR-based database mining. J Comput Aided Mol Des 19:229–242
de Cerqueira LP, Golbraikh A, Oloff S, Xiao Y, Tropsha A (2006) Combinatorial QSAR modeling of P-glycoprotein substrates. J Chem Inf Model 46:1245–1254
Oloff S, Mailman RB, Tropsha A (2005) Application of validated QSAR models of D1 dopaminergic antagonists for database mining. J Med Chem 48:7322–7332
Shen M, Beguin C, Golbraikh A, Stables JP, Kohn H, Tropsha A (2004) Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds. J Med Chem 47:2356–2364
Kovatcheva A, Golbraikh A, Oloff S, Xiao YD, Zheng W, Wolschann P et al (2004) Combinatorial QSAR of ambergris fragrance compounds. J Chem Inf Comput Sci 44:582–595
NCBI. PubChem. http://pubchem.ncbi.nlm.nih.gov/. Accessed 2007
Shoichet BK. Dr. Brian Shoichet Take-away Webpage. http://shoichetlab.compbio.ucsf.edu/take-away.php. Accessed 2007
Tondi D, Morandi F, Bonnet R, Costi MP, Shoichet BK (2005) Structure-based optimization of a non-beta-lactam lead results in inhibitors that do not up-regulate beta-lactamase expression in cell culture. J Am Chem Soc 127:4632–4639
Feng BY, Simeonov A, Jadhav A, Babaoglu K, Inglese J, Shoichet BK et al (2007) A high-throughput screen for aggregation-based inhibition in a large compound library. J Med Chem 50:2385–2390
PubChem. PubChem Bioassay AID 584. http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=584. Accessed 2007
PubChem. PubChem Bioassay AID 585. http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=585. Accessed 2007
Feng BY, Shelat A, Doman TN, Guy RK, Shoichet BK (2005) High-throughput assays for promiscuous inhibitors. Nat Chem Biol 1:146–148
Golbraikh A, Tropsha A (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. Mol Divers 5:231–243
Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A (2003) Rational selection of training and test sets for the development of validated QSAR models. J Comput Aided Mol Des 17:241–253
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36
Sybyl 7.2. (2007) Tripos, Inc.
MolconnZ 4.09. (2007) eduSoft, LC.
Kier LB, Hall LH (1976) Molecular connectivity in chemistry and drug research. Academic Press, New York
Kier LB, Hall LH (1986) Molecular connectivity in structure-activity analysis. Wiley, New York
Randi M (1975) On characterization on molecular branching. J Am Chem Soc 97:6609–6615
Kier LB (1985) A shape index from molecular graphs. Quant Struct-Act Relat 4:109–116
Kier LB (1987) Inclusion of symmetry as a shape attribute in kappa-index analysis. Quant Struct-Act Relat 6:8–12
Kier LB, Hall LH (1990) An electrotopological state index for atoms in molecules. Pharm Res 7:801
Kier LB, Hall LH (1991) An Index of Electrotopological State of Atoms in Molecules. J Math Chem 7:229
Kier LB, Hall LH (1999) Molecular structure description: the electrotopological state. Academic Press
Petitjean M (1992) Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds. J Chem Inf Comput Sci 32:331–337
Zheng W, Tropsha A (2000) Novel variable selection quantitative structure–property relationship approach based on the k-nearest-neighbor principle. J Chem Inf Comput Sci 40:185–194
Tropsha A (2003): Recent trends in quantitative structure-activity relationships. In: Abraham D (ed) Burger’s medicinal chemistry and drug discovery. Wiley, New York, pp. 49–77
Itskowitz P, Tropsha A (2005) kappa Nearest neighbors QSAR modeling as a variational problem: theory and applications. J Chem Inf Model 45:777–785
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Qsar Comb Sci 22:69–77
Wold S, Eriksson L (1995): Statistical validation of QSAR results. In: Waterbeemd Hvd (ed) Chemometrics methods in molecular design (Methods and principles in medicinal chemistry, Vol 2). Wiley-VCH Verlag GmbH, Weinheim (Germany), pp 309–318
PubChem. Structural Clustering. http://pubchem.ncbi.nlm.nih.gov/assay/assaycluster.cgi. Accessed 2007
Jorgensen WL, Tirado-Rives J (2006) QSAR/QSPR and proprietary data. J Chem Inf Model 46:937
Golbraikh A, Tropsha A (2002) Beware of q(2)!. J Mol Graph Model 20:269–276
Oprea TI, Tropsha A, Faulon JL, Rintoul MD (2007) Systems chemical biology. Nat Chem Biol 3:447–450
Acknowledgements
We would like to thank Drs. Brian Shoichet and John Irwin for providing the AmpC dataset and fruitful discussions. We also acknowledge the access to the computing facilities at the ITS Research Computing Division of the University of North Carolina at Chapel Hill. The studies reported in this paper were supported in part by the NIH research grant GM066940 and the RoadMap Center planning grant P20-HG003898. Denise Teotico was supported by NIH grants GM71630 and GM59957.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hsieh, JH., Wang, X.S., Teotico, D. et al. Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening. J Comput Aided Mol Des 22, 593–609 (2008). https://doi.org/10.1007/s10822-008-9199-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-008-9199-2