Abstract
Quantitative structure–activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482–491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.




Similar content being viewed by others
References
Hansch C, Fujita T (1964) ρ-σ-π analysis. A method for correlation of biological activity and chemical structure. J Am Chem Soc 86:1616–1626
Katritzky AR, Lobanov VS, Karelson M (1995) QSPR: the correlation and quantitative prediction of chemical and physical properties from structure. Chem Soc Rev 24:279–287
Siraki AG, Chevaldina T, Moridani MY, O’Brien PJ (2004) Quantitative structure–toxicity relationships by accelerated cytotoxicity mechanism screening. Curr Opin Drug Discov Devel 7:118–125
Mozrzymas A, Rózycka-Roszak B (2010) Prediction of critical micelle concentration of nonionic surfactants by a quantitative structure–property relationship. Comb Chem High Throughput Screen 13:39–44
Fourches D, Pu D, Tassa C, Weissleder R, Shaw SY, Mumper RJ, Tropsha A (2010) Quantitative nanostructure–activity relationship modeling. ACS Nano 4:5703–5712
Natesan S, Wang T, Lukacova V, Bartus V, Khandelwal A, Subramaniam R, Balaz S (2012) Cellular quantitative structure–activity relationship (Cell-QSAR): conceptual dissection of receptor binding and intracellular disposition in antifilarial activities of Selwood antimycins. J Med Chem 55:3699–3712
Martin E, Mukherjee P, Sullivan D, Jansen J (2011) Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. J Chem Inf Model 51:1942–1956
Winkler DA (2002) The role of quantitative structure–activity relationships (QSAR) in biomolecular discovery. Brief. Bioinform. 3:73–86
Zhou P, Tian F, Wu Y, Li Z, Shang Z (2008) Quantitative sequence–activity model (QSAM): applying QSAR strategy to model and predict bioactivity and function of peptides, proteins and nucleic acids. Curr Comput Aided Drug Des 4:311–321
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Concu R, Podda G, González-Díaz H (2009) In quantitative structure-property relationships from bio-molecular to social networks. Nova Science Publisher, New York
González-Díaz H, Vilar S, Santana L, Uriarte E (2007) Medicinal chemistry and bioinformatics — current trends in drugs discovery with networks topological indices. Curr Top Med Chem 7:1025–1039
González-Díaz H, Prado–Prado F, Perez-Montoto LG, Duardo-Sanchez A, Lopez-Diaz A (2009) QSAR models for proteins of parasitic organisms, plants and human guests: theory, applications, legal protection, taxes, and regulatory issues. Curr Proteomics 6:214–227
Munteanu CR, González-Díaz H, Magalhaes AL (2008) Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. J Theor Biol 254:476–482
González-Díaz H, Agüero-Chapin G, Varona J, Molina R, Delogu G, Santana L, Uriarte E, Gianni P (2007) 2D-RNAcoupling numbers: a new computational chemistry approach to link secondary structure topology with biological function. J Comput Chem 28:1049–1056
Munteanu CR, Vázquez JM, Dorado J, Pazos-Sierra A, Sánchez-González A, Prado–Prado FJ, González-Díaz H (2009) Complex network spectral moments for ATCUN motif DNA cleavage: first predictive study on proteins of human pathogen parasites. J Proteome Res 8:5219–5228
Neuvirth H, Raz R, Schreiber G (2004) ProMate: a structure based prediction program to identify the location of protein–protein binding sites. J Mol Biol 338:181–199
Tian F, Lv Y, Yang L (2012) Structure-based prediction of protein–protein binding affinity with consideration of allosteric effect. Amino Acids 43:531–543
Heuser P, Schomburg D (2007) Combination of scoring schemes for protein docking. BMC Bioinformatics 8:279
Kastritis PL, Bonvin AM (2010) Are scoring functions in protein–protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res 9:2216–2225
Kastritis PL, Moal IH, Hwang H, Weng Z, Bates PA, Bonvin AM, Janin J (2011) A structure-based benchmark for protein–protein binding affinity. Protein Sci 20:482–491
Park C, Marqusee S (2004) Analysis of the stability of multimeric proteins by effective ΔG and effective m-values. Protein Sci 13:2553–2558
Li H, Robertson AD, Jensen JH (2005) Very fast empirical prediction and rationalization of protein pK a values. Proteins 61:704–721
Word JM, Lovell SC, Richardson JS, Richardson DC (1999) Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol 285:1735–1747
Krivov GG, Shapovalov MV, Dunbrack RL (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77:778–795
Zhou P, Zou J, Tian F, Shang Z (2009) Fluorine bonding: how does it work in protein–ligand interactions? J Chem Inf Model 49:2344–2355
Tian F, Lv Y, Zhou P, Yang L (2011) Characterization of PDZ domain–peptide interactions using an integrated protocol of QM/MM, PB/SA, and CFEA analyses. J Comput Aided Mol Des 25:947–958
Zhou P, Tian F, Ren Y, Shang Z (2010) Systematic classification and analysis of themes in protein–DNA recognition. J Chem Inf Model 50:1476–1488
Siggers TW, Silkov A, Honig B (2005) Structural alignment of protein–DNA interfaces: insights into the determinants of binding specificity. J Mol Biol 345:1027–1045
McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. J Mol Biol 238:777–793
Word JM, Lovell SC, LaBean TH, Taylor HC, Zalis ME, Presley BK, Richardson JS, Richardson DC (1999) Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol 285:1711–1733
Wold S, Sjostrom M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intel Lab Syst 58:109–130
Stanton DT (2012) QSAR and QSPR model interpretation using partial least squares (PLS) analysis. Curr Comput Aided Drug Des 8:107–127
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–293
Zhou P, Xiang C, Wu Y, Shang Z (2010) Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 38:199–212
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Ren Y, Chen X, Feng M, Wang Q, Zhou P (2011) Gaussian process: a promising approach for the modeling and prediction of peptide binding affinity to MHC proteins. Protein Pept Lett 18:670–678
Obrezanova O, Csanyi G, Gola JMR, Segall MD (2007) Gaussian processes: a method for automatic QSAR modeling of ADME properties. J Chem Inf Model 47:1847–1857
Wolfe P (1969) Convergence conditions for ascent methods. SIAM Rev 11:226–235
Zhou P, Tian F, Lv F, Shang Z (2009) Comprehensive comparison of eight statistical modelling methods used in quantitative structure–retention relationship studies for liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome. J Chromatogr A 1216:3107–3116
Cho SJ, Hermsmeier MA (2002) Genetic algorithm guided selection: variable selection and subset selection. J Chem Inf Comput Sci 42:927–936
Zhou P, Tian F, Chen X, Shang Z (2008) Modeling and prediction of binding affinities between the human amphiphysin SH3 domain and its peptide ligands using genetic algorithm-Gaussian processes. Biopolymers (Pept Sci) 90:792–802
Tian F, Yang L, Lv F, Yang Q, Zhou P (2009) In silico quantitative prediction of peptides binding affinity to human MHC molecule: an intuitive quantitative structure–activity relationship approach. Amino Acids 36:535–554
Golbraikh A, Tropsha A (2002) Beware of q 2! J Mol Graph Model 20:269–276
Baroni M, Clementi S, Cruciani G, Kettaneh-Wold N, Wold S (1993) D-optimal designs in QSAR. Quant Struct Act Relat 12:225–231
Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemometr 23:160–171
Tian F, Zhang C, Fan X, Yang X, Wang X, Liang H (2010) Predicting the flexibility profile of ribosomal RNAs. Mol Inf 29:707–715
Ren Y, Wu B, Pan Y, Lv F, Kong X, Luo X, Li Y, Yang Q (2011) Characterization of the binding profile of peptide to transporter associated with antigen processing (TAP) using Gaussian process regression. Comput Biol Med 41:865–870
He P, Wu W, Wang HD, Yang K, Liao KL, Zhang W (2010) Toward quantitative characterization of the binding profile between the human amphiphysin-1 SH3 domain and its peptide ligands. Amino Acids 38:1209–1218
Acharya KR, Lloyd MD (2005) The advantages and limitations of protein crystal structures. Trends Pharm Sci 26:10–14
Dominguez C, Boelens R, Bonvin AM (2003) HADDOCK: a proteinprotein docking approach based on biochemical or biophysical information. J Am Chem Soc 125:1731–1737
Chen R, Li L, Weng Z (2003) ZDOCK: an initial-stage protein-docking algorithm. Proteins 52:80–87
Zhang C, Liu S, Zhu Q, Zhou Y (2005) A knowledge-based energy function for protein–ligand, protein–protein, and protein–DNA complexes. J Med Chem 48:2325–2335
Jorgensen WL, Maxwell DS, Tirado-Rives J (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc 118:11225–11236
Ponder JW, Richards FM (1987) An efficient newton-like method for molecular mechanics energy minimization of large molecules. J Comput Chem 8:1016–1024
Qiu D, Shenkin PS, Hollinger FP, Still WC (1997) The GB/SA continuum model for solvation. A fast analytical method for the calculation of approximate Born radii. J Phys Chem 101:3005–3014
Almlöf M, Brandsdal BO, Aqvist J (2004) Binding affinity prediction with different force fields: examination of the linear interaction energy method. J Comput Chem 25:1242–1254
Khoruzhii O, Donchev AG, Galkin N, Illarionov A, Olevanov M, Ozrin V, Queen C, Tarasov V (2008) Application of a polarizable force field to calculations of relative protein–ligand binding affinities. Proc Natl Acad Sci USA 105:10378–10383
Liu S, Zhang C, Zhou H, Zhou Y (2004) A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins 56:93–101
Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726
Biela A, Sielaff F, Terwesten F, Heine A, Steinmetzer T, Klebe G (2006) Ligand binding stepwise disrupts water network in thrombin: enthalpic and entropic changes reveal classical hydrophobic effect. J Med Chem 55:6094–6110
Freire E (2009) ITC: affinity is not everything. Eur Pharm Rev 14:44–47
Moreira IS, Fernandes PA, Ramos MJ (2007) Hot spots: a review of the protein–protein interface determinant amino-acid residues. Proteins 68:803–812
Kortemme T, Baker D (2002) A simple physical model for binding energy hot spots in protein–protein complexes. Proc Natl Acad Sci USA 99:14116–14121
Ofran Y, Rost B (2007) Protein–protein interaction hotspots carved into sequences. PLoS Comput Biol 3:e119
Xu D, Tsai CJ, Nussinov R (1997) Hydrogen bonds and salt bridges across protein–protein interfaces. Protein Eng 10:999–1012
Lo Conte L, Chothia C, Janin J (1999) The atomic structure of protein–protein recognition sites. J Mol Biol 285:2177–2198
Petsalaki E, Russell RB (2008) Peptide-mediated interactions in biological systems: new discoveries and applications. Curr. Opin. Biotech. 19:344–350
Tsai CJ, Nussinov R (1997) Hydrophobic folding units at protein–protein interfaces: implications to protein folding and to protein–protein association. Protein Sci 6:1426–1437
Young L, Jernigan RL, Covell DG (1994) A role for surface hydrophobicity in protein–protein recognition. Protein Sci 3:717–729
Tuffery P, Derreumaux P (2012) Flexibility and binding affinity in protein–ligand, protein–protein and multi-component protein interactions: limitations of current computational approaches. J R Soc Interface 9:20–33
Burnett JC, Kellogg GE, Abraham DJ (2000) Computational methodology for estimating changes in free energies of biomolecular association upon mutation. The importance of bound water in dimer-tetramer assembly for beta 37 mutant hemoglobins. Biochemistry 39:1622–1633
Jiang L, Kuhlman B, Kortemme T, Baker D (2005) A “solvated rotamer” approach to modeling water-mediated hydrogen bonds at protein–protein interfaces. Proteins 58:893–904
Kumar S, Nussinov R (2002) Close-range electrostatic interactions in proteins. ChemBioChem 3:604–617
Missimer JH, Steinmetz MO, Baron R, Winkler FK, Kammerer RA, Daura X, van Gunsteren WF (2007) Configurational entropy elucidates the role of salt-bridge networks in protein thermostability. Protein Sci 16:1349–1359
Kumar S, Wolfson HJ, Nussinov R (2001) Protein flexibility and electrostatic interactions. IBM J Res Dev 45:499–512
Marqusee S, Sauer RT (1994) Contributions of a hydrogen bond/salt bridge network to the stability of secondary and tertiary structure in lambda repressor. Protein Sci 3:2217–2225
Zhou P, Tian F, Shang Z (2009) 2D depiction of nonbonding interactions for protein complexes. J Comput Chem 30:940–951
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 31200993), the Fundamental Research Funds for the Central Universities (No. ZYGX2012J111), the Young Teacher Doctoral Discipline Fund of Ministry of Education of China (No. 20120185120025) and the scientific research funds of UESTC.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhou, P., Wang, C., Tian, F. et al. Biomacromolecular quantitative structure–activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein–protein binding affinity. J Comput Aided Mol Des 27, 67–78 (2013). https://doi.org/10.1007/s10822-012-9625-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-012-9625-3