Consensus of Sample-Balanced Classifiers for Identifying Ligand-Binding Residue by Co-evolutionary Physicochemical Characteristics of Amino Acids

Chen, Peng

doi:10.1007/978-3-642-39678-6_35

Consensus of Sample-Balanced Classifiers for Identifying Ligand-Binding Residue by Co-evolutionary Physicochemical Characteristics of Amino Acids

Peng Chen⁵

Conference paper

1583 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 375))

Abstract

Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. Due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein-ligand binding site predictor. Experimental results on CASP9 targets demonstrated that our method compared favorably with the state-of-the-art.

This work was supported Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbas, A., Kong, X.B., Liu, Z., et al.: Automatic Peak Selection by Abenjamini-hochberg-based Algorithm. PLoS One 8(1), e53112 (2013)
Google Scholar
Alipanahi, B., Gao, X., Karakoc, E., et al.: Picky: A Novel Svd-based Nmr Spectra Peak Picking Pethod. Bioinformatics 25(12), i268–i275 (2009)
Google Scholar
Alipanahi, B., Gao, X., Karakoc, E., et al.: Error Tolerant Nmr Backbone Resonance Assignment and Automated Structure Generation. J. Bioinform. Comput. Biol. 9(1), 15–41 (2011)
Article Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., et al.: Gapped Blast and Psi-blast: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Chen, P., Li, J.: Sequence-based Identification of Interface Residues by An Integrative Profile Combining Hydrophobic and Evolutionary Information. BMC Bioinformatics 11, 402 (2010)
Article Google Scholar
Chen, P., Li, J.: Prediction of Protein Long-range Contacts Using An Ensemble of Genetic Algorithm Classifiers with Sequence Profile Centers. BMC Struct. Biol. 10(Suppl. 1), S2 (2010)
Google Scholar
Chen, P., Wong, L., Li, J.: Detection of Outlier Residues for Improving Interface Prediction in Protein Heterocomplexes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1155–1165 (2012)
Article MathSciNet Google Scholar
Chen, P., Li, J., Wong, L., et al.: Accurate Prediction of Hot Spot Residues Through Physicochemical Characteristics of Amino Acid Sequences. Proteins (2013)
Google Scholar
Gao, X., Bu, D., Xu, J., et al.: Improving Consensus Contact Prediction via Server Correlation Reduction. BMC Struct. Biol. 9, 28 (2009)
Article Google Scholar
Gonzalez, A.J., Liao, L., Wu, C.H.: Predicting ligand binding residues and functional sites using multipositional correlations with graph theoretic clustering and kernel cca. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 992–1001 (2012)
Article Google Scholar
Jang, R., Gao, X., Li, M.: Towards Fully Automated Structure-based NMR Resonance Assignment of 15N-labeled Proteins from Automatically Picked Peaks. J. Comput. Biol. 18(3), 347–363 (2011)
Article MathSciNet Google Scholar
Jang, R., Gao, X., Li, M.: Combining automated peak tracking in SAR by NMR with structure-based backbone assignment from 15N-NOESY. BMC Bioinformatics 13(Suppl. 3), S4 (2012)
Google Scholar
Kauffman, C., Karypis, G.: Librus: Combined Machine Learning and Homology Information for Sequence-based Ligand-binding Residue Prediction. Bioinformatics 25(23), 3099–3107 (2009)
Article Google Scholar
Kawashima, S., Pokarowski, P., Pokarowska, M., et al.: Aaindex: Amino Acid Index Database, Progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)
Google Scholar
Liu, Z., Abbas, A., Jing, B.Y., et al.: Wavpeak: Picking Nmr Peaks Through Wavelet-Based Smoothing and Volume-based Filtering. Bioinformatics 28(7), 914–920 (2012)
Article Google Scholar
Messih, M.A., Chitale, M., Bajic, V.B., et al.: Protein Domain Recurrence and Order Can Enhance Prediction of Protein Functions. Bioinformatics 28(18), i444–i450 (2012)
Google Scholar
Palmer, R.A., Niwa, H.: X-ray Crystallographic Studies of Protein-ligand Interactions. Biochem. Soc. Trans. 31(Pt. 5), 973–979 (2003)
Article Google Scholar
Passerini, A., Punta, M., Ceroni, A., et al.: Identifying Cysteines and Histidines in Transition-metal-binding Sites Using Support Vector Machines and Neural Networks. Proteins 65(2), 305–316 (2006)
Article Google Scholar
Pintacuda, G., John, M., Su, X.C., et al.: Nmr Structure Determination of Protein-Ligand Complexes by Lanthanide Labeling. Acc. Chem. Res. 40(3), 206–212 (2007)
Article Google Scholar
Schmidt, T., Haas, J., Gallo Cassarino, T., et al.: Assessment of Ligand-binding Residue Predictions in Casp9. Proteins 79(Suppl. 10), 126–136 (2011)
Article Google Scholar
Wang, B., Chen, P., Huang, D.S., et al.: Predicting Protein Interaction Sites from Residue Spatial Sequence Profile and Evolution Rate. FEBS Lett. 580(2), 380–384 (2006)
Article Google Scholar
Wang, J., Li, Y., Wang, Q., et al.: Proclusensem: Predicting Membrane Protein Types by Fusing Different Modes of Pseudo Amino Acid Composition. Comput. Biol. Med. 42(5), 564–574 (2012)
Article Google Scholar
Wang, J., Gao, X., Wang, Q., et al.: Prodis-contshc: Learning Protein Dissimilarity Measures and Hierarchical Context Coherently for Protein-protein Comparison in Protein Database Retrieval. BMC Bioinformatics 13(Suppl. 7), S2 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
Peng Chen

Authors

Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Learning and Systems Biology Laboratory, School of Electronics and Information Engineering, Tongji University, Shanghai, China
De-Shuang Huang
Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, 208 016, Kanpur, India
Phalguni Gupta
Tsinghua University, Beijing, China
Ling Wang
Department of Biotechnology, Indian Institute of Technology Madras, 600 036, Chennai, Tamilnadu, India
Michael Gromiha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, P. (2013). Consensus of Sample-Balanced Classifiers for Identifying Ligand-Binding Residue by Co-evolutionary Physicochemical Characteristics of Amino Acids. In: Huang, DS., Gupta, P., Wang, L., Gromiha, M. (eds) Emerging Intelligent Computing Technology and Applications. ICIC 2013. Communications in Computer and Information Science, vol 375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39678-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-39678-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39677-9
Online ISBN: 978-3-642-39678-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics