Abstract
The goal of a quantitative structure–activity relationship (QSAR) model is to encode the relationship between molecular structure and biological activity or physical property. Based on this encoding, such models can be used for predictive purposes. Assuming the use of relevant and meaningful descriptors, and a statistically significant model, extraction of the encoded structure–activity relationships (SARs) can provide insight into what makes a molecule active or inactive. Such analyses by QSAR models are useful in a number of scenarios, such as suggesting structural modifications to enhance activity, explanation of outliers and exploratory analysis of novel SARs. In this paper we discuss the need for interpretation and an overview of the factors that affect interpretability of QSAR models. We then describe interpretation protocols for different types of models, highlighting the different types of interpretations, ranging from very broad, global, trends to very specific, case-by-case, descriptions of the SAR, using examples from the training set. Finally, we discuss a number of case studies where workers have provide some form of interpretation of a QSAR model.



Similar content being viewed by others
Notes
It should be noted that even though the problem may be understood from a mechanistic point of view, it is still possible for one to derive poor QSAR models, since the numerical characterization of the mechanistic features responsible for the property might be inaccurate or incomplete.
References
Agrafiotis DK, Cedeño W (2002) Feature selection for structure-activity correlation using binary particle swarms. J Med Chem 45:1098–1107
Agrawal V, Sharma R, Khadikar P (2002) QSAR studies on antimalarial substituted phenyl analogues and their nω oxides. Bioorg Med Chem 10(5):1361–1366
Arakawa M, Hasegawa K, Funatsu K (2006) QSAR study of anti-HIV HEPT analogues based on multiobjective genetic programming and counter-propagation neural network. Chemom Intel Lab Syst 83:91–98
Banks J (1985) Nomograms. In: Encyclopedia of statistical sciences, vol 6. Wiley, New York
Bender A, Mussa H, Glen R, Reiling S (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J Chem Inf Comput Sci 44(5):1708–1718
Besalu E, Gallegos A, Carbo-Dorca R (2001) Topological quantum similarity indices and their use in QSAR: application to several families of antimalarial compounds. Commun Math Comp Chem 44:41–64
Breiman L (2001) Statistical modeling: two cultures. Stat Sci 16:199–231
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton, FL
Bremser W (1978) HOSE—a novel substructure code. Anal Chim Acta 103:355–365
Brown N, McKay B, Gasteiger J (2006) A novel workflow for the inverse QSPR problem using multiobjective optimization. J Comput Aided Mol Des 20:333–341
Burden F (1989) Molecular identification number for substructure searches. J Chem Inf Comput Sci 29:225–227
Byvatov E, Baringhaus KH, Schneider G, Matter H (2007) A virtual screening filter for identification of cytochrome P450 2C9 (CYP2C9) inhibitors. QSAR Comb Sci 26:618–628
Carbo-Dorca R, Leyda L, Arnau M (1980) How similar is a molecule to another? An electron density measure of similarity between two molecular structures. Int J Quantum Chem 17(6):1185–1189
Chastrette M, Zakarya D, Peyraud J (1994) Structure-musk odor relationships for tetralins and indans using neural networks (on the contribution of descriptors to the classification). Eur J Med Chem 29:343–348
Chatterjee S, Hadi A (1986) Influential observations, high leverage points, and outliers in linear regression. Stat Sci 1(3):379–416
Chin T, So S (2004) Development of neural network QSPR models for Hansch substituent constants. 2. Applications in QSAR studies of HIV-1 reverse transcriptase and dihydrofolate reductase inhibitors. J Chem Inf Comput Sci 44:154–160
Cho BH, Yu H, Lee J, Chee YJ, Kim IY, Kim SI (2008) Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels. IEEE Trans Inf Technol Biomed 12:247–256
Chohan K, Paine S, Mistry J, Barton P, Davis A (2005) A rapid computational filter for cytochrome P450 1A2 inhibition potential of compound libraries. J Med Chem 48:5154–5161
Colmenarejo G, Pedraglio A, Lavandera J (2001) Cheminformatic models to predict binding affinities to human serum albumin. J Med Chem 44(25):4370–4378
Consonni V, Todeschini R, Pavan M, Gramatica P (2002) Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. Part 2. Application of the novel 3D molecular descriptors to QSAR/QSPR studies. J Chem Inf Comput Sci 42(3):693–705
Cruz-Monteagudo M, Borges F, Perez Gonzalez M, Dias Soeiro Cordeiro MN (2007) Computational modeling tools for the design of potent antimalarial bisbenzamidines: overcoming the antimalarial potential of pentamidine. Bioorg Med Chem 15:5322–5339
De Lucca G, Liang J, De Lucca I (1999) Stereospecific synthesis, structure-activity relationship, and oral bioavailability of tetrahydropyrimidin-2-one HIV protease inhibitors. J Med Chem 42(1):135–152
Dias Selassie C, Li Rl, Poe M, Hansch C (1991) Optimization of hydrophobic and hydrophilic substituent interactions of 2,4-diamino-5-(substituted-benzyl)pyrimidines with dihydrofolate reductase. J Med Chem 34(1):46–54
Dietrich SW, Blaney JM, Reynolds MA, Jow PYC, Hansch C (1980) Quantitative structure-selectivity relationships. Comparison of the inhibition of Escherichia coli and bovine liver dihydrofolate reductase by 5-(substituted benzyl)-2,4-diaminopyrimidines. J Med Chem 23(11):1205–1212
Diller DJ, Hobbs DW (2007) Understanding hERG inhibition with QSAR models based on a one dimensional molecular representation. J Comput Aided Mol Des 21:379–393
Diudea M (1997) Cluj matrix invariants. J Chem Inf Comput Sci 37:300–305
Doweyko A (2008) QSAR: dead or alive? J Comput Aided Mol Des 22:81–89
Durant J, Leland B, Henry D, Nourse J (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42(6):1273–1280
Dutta D, Guha R, Chen T, Wild D (2007) Ensemble feature selection: consistent descriptor subsets for multiple QSAR models. J Chem Inf Model 47(3):989–997
Eriksson L, Johansson E, Lindgren F, Sjostrom M, Wold S (2002) Megavariate analysis of hierarchical QSAR data. J Comput Aided Mol Des 16:711–726
Estrada E (2002) Physicochemical interpretation of molecular connectivity indices. J Phys Chem A 106:9085–9091
Fernandez M, Caballero J (2006) Modeling of activity of cyclic urea HIV-1 protease inhibitors using regularized-artificial neural networks. Bioorg Med Chem 14:280–294
Franke L, Schwarz O, Muller-Kuhrt L, Hoernig C, Fischer L, George S, Tanrikulu Y, Schneider P, Werz O, Steinhilber D, Schneider G (2007) Identification of natural-product-derived inhibitors of 5-lipoxygenase activity by ligand-based virtual screening. J Med Chem 50(11):2640–2646
Gangjee A, Yu J, McGuire J, Cody V, Galitsky N, Kisliuk R, Queener S (2000) Design, synthesis, and X-ray crystal structure of a potent dual inhibitor of thymidylate synthase and dihydrofolate reductase as an antitumor agent. J Med Chem 43:3837–3851
Garcia-Domenech R, Galvez J, de Julian-Ortiz J, Pogliani L (2008) Some new trends in chemical graph theory. Chem Rev 108(3):1127–1169
Garg R, Bhhatarai B (2004) A mechanistic study of 3-aminoindazole cyclic urea HIV-1 protease inhibitors using comparative QSAR. Bioorg Med Chem 12(22):5819–5831
Garson D (1991) Interpreting neural network connection strengths. AI Expert 6(7):47–51
Gavaghan CL, Arnby CH, Blomberg N, Strandlund G, Boyer S (2007) Development, interpretation and temporal evaluation of a global QSAR of hERG electrophysiology screening data. J Comput Aided Mol Des 21:189–206
Girones X, Gallegos A, Carbo-Dorca R (2001) Antimalarial activity of synthetic 1,2,4-trioxanes and cyclic peroxy ketals, a quantum similarity study. J Comput Aided Mol Des 15:1053–1063
Gleeson MP, Davis AM, Chohan KK, Paine SW, Boyer S, Gavaghan CL, Arnby CH, Kankkonen C, Albertson N (2007) Generation of in-silico cytochrome P450 1A2, 2C9, 2C19, 2D6, and 3A4 inhibition QSAR models. J Comput Aided Mol Des 21:559–573
Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26(5):694–701
Guha R, Jurs P (2004) Development of linear, ensemble, and nonlinear models for the prediction and interpretation of the biological activity of a set of PDGFR inhibitors. J Chem Inf Comput Sci 44(6):2179–2189
Guha R, Jurs P (2004) The development of QSAR models to predict and interpret the biological activity of artemisinin analogues. J Chem Inf Comput Sci 44:1440–1449
Guha R, Jurs P (2005) Interpreting computational neural network QSAR models: a measure of descriptor importance. J Chem Inf Model 45:800–806
Guha R, Schürer S (2008) Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays. J Comput Aided Mol Des 22(6–7):367–384
Guha R, Stanton D, Jurs P (2005) Interpreting computational neural network QSAR models: a detailed interpretation of the weights and biases. J Chem Inf Model 45:1109–1121
Gunturi S, Narayanan R, Khandelwal A (2006) In silico ADME modelling 2: computational models to predict human serum albumin binding affinity using ant colony systems. Bioorg Med Chem 14:4118–4129
Hansch C (1969) A quantitative approach to biochemical structure-activity relationships. Acc Chem Res 2:232–239
Hansch C, Fujita T (1964) ε−σ−π analysis. A method for the correlation of biological activity and chemical structure. J Am Chem Soc 86:1616–1626
Hansch C, Leo A, Hoekman D (1995) Exploring QSAR: hydrophobic, electronic, and steric constants. American Chemical Society, Washington, DC
Hassan M, Brown RD, Varma-O’Brien S, Rogers D (2006) Cheminformatics analysis and learning in a data pipelining environment. Mol Divers 10(3):283–299
Hemmer M, Gasteiger J (2000) Prediction of three-dimensional molecular structures using information from infrared spectra. Anal Chim Acta 420:145–154
Hemmer M, Steinhauer V, Gasteiger J (1999) Deriving the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc 19:151–164
Hirst J (1996) Nonlinear quantitative structure-activity relationship for the inhibition of dihydrofolate reductase by pyrimidines. J Med Chem 39(18):3526–3532
Ivanciuc O, Braun W (2007) Robust quantitative modeling of peptide binding affinities for MHC molecules using physical-chemical descriptors. Protein Pept Lett 14:903–916
Johnson S (2008) The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J Chem Inf Model 48(1):25–26
Katritzky A, Petrukhin R, Tatham D, Basak S, Benfenati E, Karelson M, Maran U (2001) Interpretation of quantitative structure-property and -activity relationships. J Chem Inf Comput Sci 41:679–685
Katritzky A, Oliferenko A, Lomaka A, Karelson M (2002) Six-membered cyclic ureas as HIV-1 protease inhibitors: a QSAR study based on CODESSA PRO approach. Bioorg Med Chem Lett 12:3453–3457
Katritzky A, Kulshyn O, Stoyanova-Slavova I, Dobehev D, Kuanar M, Fara D, Karelson M (2006) Antimalarial activity: a QSAR modeling using CODESSA PRO software. Bioorg Med Chem 14:2333–2357
Kier L, Hall L (1986) Molecular connectivity in structure-activity analysis. Wiley, New York
Kier L, Hall L (1999) Molecular structure description: the electrotopological state. Academic Press, Burlington, MA
Kiralj R, Ferreira M (2003) A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors. I. The chemometric approach. J Mol Graph Model 21:435–448
Klayman D (1985) Qinghaosu (artemisinin): an antimalarial drug from China. Science 228:1049
Klon A, Heroux A, Ross L, Pathak V, Johnson C, Piper J, Borhani D (2002) Atomic structures of human dihydrofolate reductase complexed with NADPH and two lipophilic antifolates at 1.09 Angstrom and 1.05 Angstrom resolution. J Mol Biol 320:677–693
Kramer C, Beck B, Kriegl JM, Clark T (2008) A composite model for hERG blockade. ChemMedChem 3:254–265
Leonard JT, Roy K (2007) Comparative classical QSAR modeling of anti-HIV thiocarbamates. QSAR Comb Sci 26:980–990
Lewis RA (2005) A general method for exploiting QSAR models in lead optimization. J Med Chem 48:1638–1648
Lin TS, Zhu LY, Xu SP, Divo AA, Sartorelli AC (1991) Synthesis and antimalarial activity of 2-aziridinyl- and 2,3-bis(aziridinyl)-1,4-naphthoquinonyl sulfonate and acylate derivatives. J Med Chem 34(5):1634–1639
Mackay A (1977) Scientific quotations: harvest of a quiet eye. Crane, Russak & Co, New York
Masek B, Shen L, Smith K, Pearlman R (2008) Sharing chemical information without sharing chemical structure. J Chem Inf Model 48(2):256–261
Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton, FL
Miller JF, Brieger M, Furfine ES, Hazen RJ, Kaldor I, Reynolds D, Sherrill RG, Spaltenstein A (2005) Novel P1 chain-extended HIV protease inhibitors possessing potent anti-HIV activity and remarkable inverse antiviral resistance profiles. Bioorg Med Chem Lett 15(15):3496–3500
Moreau G, Broto P (1980) Autocorrelation of molecular structures: application to SAR studies. Nouv J Chim 4:757–764
Navia-Vázquez A, Parrado-Hernández E (2006) Support vector machine interpretation. Neurocomputing 69:1754–1759
Ney H (1995) On the probabilistic interpretation of neural network classifiers and discriminative training criteria. IEEE Trans Pattern Anal Mach Intel 17:107–119
Nguyen-Cong V, Van Dang G, Rode B (1996) Using multivariate adaptive regression splines to QSAR studies of dihydroartemisinin derivatives. Eur J Med Chem 31:797–803
Otzen T, Wempe E, Kunz B, Bartels R, Lehwark-Yvetot G, Hansel W, Schaper K, Seydel J (2004) Folate-synthesizing enzyme system as target for development of inhibitors and inhibitor combinations against candida albicans-synthesis and biological activity of new 2,4-diaminopyrimidines and 4′-substituted 4-aminodiphenyl sulfones. J Med Chem 47:240–253
Pearlman RS, Smith KM (1999) Metric validation and the receptor-relevant subspace concept. J Chem Inf Comput Sci 39:28–35
Pinheiro J, Kiralj R, Ferreira M, Romero O (2003) Artemisinin derivatives with antimalarial activity against Plasmodium Falciparum designed with the aid of quantum chemical and partial least squares methods. QSAR Comb Sci 22:830–842
Polanski J, Zouhiri F, Jeanson L, Desmaele D, D’Angelo J, Mouscadet J, Gieleciak R, Gasteiger J, Le Bret M (2002) Use of the Kohonen neural network for rapid screening of ex vivo anti-HIV activity of styrylquinolines. J Med Chem 45:4647–4654
Purdy R (1996) A mechanism-mediated model for carcinogenicity: model content and prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 25 organic chemicals. Environ Health Perspect 104:1085–1094
Randic M (1978) Fragment search in acyclic structures. J Chem Inf Comput Sci 18(2):101–107
Randic M, Zupan J (2001) On interpretation of well-known topological indices. J Chem Inf Comput Sci 41: 550–560
Randic M, Balaban A, Basak S (2001) On structural interpretation of several distance related topological indices. J Chem Inf Comput Sci 41:593–601
Ravichandran V, Jain PK, Mourya VK, Agrawal RK (2007) QSAR study on some arylsulfonamides as anti-HIV agents. Med Chem Res 16:342–351
Renner S, Fechner U, Schneider G (2006) Pharmacophores and pharmacophore searches, methods and principles in medicinal chemistry, vol 32, chap Alignment-free pharmacophore patterns—a correlation vector approach. Wiley-VCH, Weinheim, Germany, pp 49–79
Roche O, Schneider P, Zuegge J, Guba W, Kansy M, Alanine A, Bleicher K, Danel F, Gutknecht EM, Rogers-Evans M, Neidhart W, Stalder H, Dillon M, Sjogren E, Fotouhi N, Gillespie P, Goodnow R, Harris W, Jones P, Taniguchi M, Tsujii S, von der Saal W, Zimmermann G, Schneider G (2002) Development of a virtual screening method for identification of “frequent hitters” in compound libraries. J Med Chem 45(1):137–142
Rubner J, Schulten K, Tavan P (1990) A self organizing network for complete feature selection. In: International conference on parallel processing in neural systems and computers. Elsevier, Dusseldorf
Sahu KK, Ravichandran V, Mourya VK, Agrawal K (2007) QSAR analysis of caffeoyl naphthalene sulfonamide derivatives as HIV-1 integrase inhibitors. Med Chem Res 15:418–430
Sato M, Tsukimoto H (2001) Rule extraction from neural networks via decision tree induction. In: Neural networks, proceedings international Joint conference, vol 3. IEEE Computer Society, Los Alamitos, CA, USA, pp 1870–1875
Schölkopf B, Smola A (2002) Learning with Kernels. MIT Press, Cambridge, MA
Sheridan RP, Korzekwa KR, Torres RA, Walker MJ (2007) Empirical regioselectivity models for human cytochromes P450 3A4, 2D6, and 2C9. J Med Chem 50:3173–3184
Stanton D (2003) On the physical interpretation of QSAR models. J Chem Inf Comput Sci 43(5):1423–1433
Stanton D, Jurs P (1990) Development and use of charged partial surface area structural descriptors in computer assisted quantitative structure property relationship studies. Anal Chem 62:2323–2329
Stanton D, Mattioni BE, Knittel J, Jurs P (2004) Development and use of hydrophobic surface area (HSA) descriptors for computer assisted quantitative structure-activity and structure-property relationship studies. J Chem Inf Comput Sci 44:1010–1023
Summerfield R, Daigle D, Mayer S, Mallik D, Hughes D, Jackson S, Sulek M, Organ M, Brown E, Junop M (2006) A 2.13A structure of E. coli dihydrofolate reductase bound to a novel competitive inhibitor reveals a new binding surface involving the M20 loop region. J Med Chem 49(24):6977–6986
Sutter J, Dixon S, Jurs P (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84
Taha I, Ghosh J (1999) Symbolic interpretation of artificial neural networks. IEEE Trans Knowl Data Eng 11:448–463
Takahashi T (1991) An information theoretical interpretation of neuronal activities. In: Neural networks, proceedings International Joint Conference, vol 2. IEEE Computer Society, Los Alamitos, CA, USA, pp 645–648
Tang LJ, Zhou YP, Jiang JH, Zou HY, Wu HL, Shen GL, Yu RQ (2007) Radial basis function network-based transform for a nonlinear support vector machine as optimized by a particle swarm optimization algorithm with application to QSAR studies. J Chem Inf Model 47:1438–1445
Tian F, Zhou P, Lv F, Song R, Li Z (2007) Three-dimensional holograph vector of atomic interaction field (3D-HoVAIF): a novel rotation-translation invariant 3D structure descriptor and its applications to peptides. J Pept Sci 13:549–566
Todeschini R, Consonni V (2002) Handbook of molecular descriptors. Wiley-VCH, Berlin
Tropsha A, Gramatica P, Gombar V (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77
Urbanek S (2006) Graphics of large datasets, chap Trees, pp. 177–2002. Statistics and Computing. Springer, Berlin Heidelberg
Urbanek S, Unwin A (2002) Making trees interactive with KLIMT—a COSADA software project. Stat Comp Graph Newsl 13(1):13–16
Usdun B, Melssen WJ, Buydens LMC (2007) Visualisation and interpretation of support vector regression models. Anal Chim Acta 595:299–309
Vapnik V (1998) Statistical learning theory. Wiley, New York
Venkatraman V, Dalby AR, Yang ZR (2004) Evaluation of mutual information and genetic programming for feature selection in QSAR. J Chem Inf Comput Sci 44:1686–1692
Ventura C, Martins F (2008) Application of quantitative structure-activity relationships to the modeling of antitubercular compounds. 1. The hydrazide family. J Med Chem 51(3):612–624
Verma RP, Hansch C, Selassie CD (2007) Comparative QSAR studies on PAMPA/modified PAMPA for high throughput profiling of drug absorption potential with respect to Caco-2 cells and human intestinal absorption. J Comput Aided Mol Des 21:3–22
Vilar S, Santana L, Uriarte E (2006) Probabilistic neural network model for the in silico evaluation of anti-HIV activity and mechanism of action. J Med Chem 49:1118–1124
Visco D, Pophale R, Rintoul M, Faulon J (2002) Developing a methodology for an inverse quantitative structure-activity relationship using the signature molecular descriptor. J Mol Graph Model 20:429–438
Weininger D, Weininger A, Weininger J (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29:97–101
Yuan H, Parrill A (2002) QSAR studies of HIV-1 integrase inhibition. Bioorg Med Chem 10(12):4169–4183
Zahouily M, Lazar M, Elmakssoudi A, Rakik J, Elaychi S, Rayadh A (2006) QSAR for anti-malarial activity of 2-aziridinyl and 2,3-bis(aziridinyl)-1,4-naphthoquinonyl sulfonate and acylate derivatives. J Mol Model 12:398–405
Zhang H, Li H, Ma Q (2007) QSAR study of a large set of 3-pyridyl ethers as ligands of the a-4 b −2 nicotinic acetylcholine receptor. J Mol Graph Model 26:226–235
Zhou D, Alelyunas Y, Liu R (2008) Scores of extended connectivity fingerprint as descriptors in QSPR study of melting point and aqueous solubility. J Chem Inf Model 48(5):981–987
Acknowledgments
I would like to thank Prof. Gerald Maggiora and Dr. David Stanton for useful comments on the issues underlying interpretability.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guha, R. On the interpretation and interpretability of quantitative structure–activity relationship models. J Comput Aided Mol Des 22, 857–871 (2008). https://doi.org/10.1007/s10822-008-9240-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-008-9240-5