Abstract
In this paper, we focus on protein contact map prediction, one of the most important intermediate steps of the protein folding problem. We describe a method where contact maps of proteins are predicted with decision trees, using as input codings the information obtained from all possible pairs of amino acids that were formed in the training data set. As a result, the algorithm creates a model that consists of 400 decision trees (one for each possible amino acids pair), which takes into account the amino acids frequency in the subsequence existent between the couple of amino acids analyzed. In order to evaluate the method generalization capabilities, we carry out an experiment using 173 non-homologous proteins of known structures, selected from the protein databank (PBD). Our results indicate that the method can assign protein contacts with an average accuracy of 0.34, superior to the 0.25 obtained by the FNETCSS method. This shows that our algorithm improves the accuracy with respect to the methods compared, especially with the increase of protein length.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ouzounis, C.A., Valencia, A.: Early bioinformatics: the birth of a discipline a personal view. Bioinformatics 19(17), 2176–2190 (2003)
Cohen, J.: Bioinformatics An Introduction for Computer Scientists. Computing 36(2), 122–158 (2004)
Deng, M.: Mapping gene ontology to proteins based on protein-protein interaction data.. Bioinformatics 20, 895–902 (2004)
Bork, P.: Protein interaction network from yeast to human. Curr. Opin. Struct. Biol 14, 292–299 (2004)
Robson, B.B.: Analysis of the Code Relating Sequence to Conformation in Globular Proteins. Biochem. J. 141(3), 853–867 (1974)
Ramanathan, A.: Using Tensor Analysis to characterize Contact-map Dynamics of Proteins. PhD thesis, Carnegie Mellon University Pittsburgh, PA (2008)
Fariselli, P., Casadio, R.: A neural network based predictor of residue contacts in proteins. Protein Engineering 12(1), 15–21 (1999)
Fariselli, P., Olmea, O., Valencia, A., Casadio, R.: Prediction of contact maps with neural networks and correlated mutations. Protein Engineering 14(11), 835–843 (2001)
Bohr, H.: Protein secondary structure and homology by neural networks. FEBS Letters 241(1), 223–228 (1988)
Bohr, H., Bohr, J., Brunak, S., Cotterill, R.M.J., Fredholm, H., Lautrupt, B., Petersen, S.B.: A novel approach to prediction of the 3-dimensional structures of protein backbones by neural networks. FEBS Letters 261(1), 43–46 (1990)
Pollastri, G., Baldi, P.: Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18, 1–9 (2002)
Kim, H.: Computational analysis of hydrogen bonds in protein-RNA complexes for interaction patterns.. FEBS Letters 552, 231–239 (2003)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Nakamura, H., Berman, H.M., Henrick, K.: Announcing the worldwide Protein Data Bank. Nature Structural Biology 10(12), 98 (2003)
Thomas, C., Casari, D.J., Sander, G.: The prediction of protein contacts from multiple sequence alignments. Protein Engng. 9, 941–948 (1996)
Pazos, F., Helmer-citterich, M., Ausiello, G., Valencia, A.: Correlated Mutations Contain Information About Protein - protein Interaction. Dipartimento Biologia, U Roma, and Tor Vergata, pp. 511–523 (1997)
Aguilar-Ruiz, J.S.Y., Anguiano-Rey, E., Márquez-Chamorro, A.E.: Marco de Referencia en la Calidad de la Predicción de Mapas de Contacto de Proteínas. In: XIII Conferencia de la Asociación Española para la Inteligencia Artificial (2009)
Santiesteban_Toca, C.: Predicción de mapas de contacto basado en distancias. Trabajo de diploma de estudios avanzados. Doctorado Iberoamricano de Soft Computing (2010)
Aguilar-Ruiz, J.S.Y., Anguiano-Rey, E., Márquez-Chamorro, A.E.: Definición de Umbral Mínimo para la Predicción de Estructura Secundaria de Proteínas. In: ESTYLF. XV Congreso Español Sobre Tecnologías y Lógica Fuzzy (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Santiesteban-Toca, C.E., Aguilar-Ruiz, J.S. (2011). DTP: Decision Tree-Based Predictor of Protein Contact Map. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21827-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-21827-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21826-2
Online ISBN: 978-3-642-21827-9
eBook Packages: Computer ScienceComputer Science (R0)