Abstract
Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process – thus freeing the specialist to carry out more valuable tasks – has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bazzan, A.L.C., da Silva, S.C., Engel, P.M., Schroeder, L.F.: Automatic annotation of keywords for proteins related to mycoplasmataceae using machine learning techniques. Bioinformatics 18(S2), S1–S9 (2002)
BenHur, A., Brutlag, D.: Sequence motifs: highly predictive features of protein function. In: Feature extraction, foundations and applications, pp. 625–643. Springer, Heidelberg (2005)
Cai, C., Han, L., Ji, Z., Chen, Y.: Enzyme family classification by support vector machines. Proteins: Structure, Function, and Bioinformatics 55(1), 66–76 (2004)
des Jardins, M., Karp, P., Krummenacker, M., Lee, T., Ouzounis, C.: Prediction of enzyme classification from protein sequence without the use of sequence similarity. In: Proceedings of the International Conference on Intelligent Systems Molecular Biology, pp. 92–99 (1997)
dos Santos, C.T., Bazzan, A.L.C.: Integrating knowledge through cooperative negotiation – A case study in bioinformatics. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds.) AIS-ADM 2005. LNCS, vol. 3505, pp. 277–288. Springer, Heidelberg (2005)
Gasteiger, E., Jung, E., Bairoch, A.: Swiss-prot: Connecting biological knowledge via a protein database. Curr. Issues Mol. Biol. 3, 47–55 (2001)
Kretschmann, E., Fleischmann, W., Apweiler, R.: Automatic rule generation for protein annotation with the C4. 5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17, 920–926 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Weinert, W., Lopes, H.: Neural networks for protein classification. Applied Bioinformatics 3(1), 41–48 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
dos Santos, C.T., Bazzan, A.L.C., Lemke, N. (2009). Automatic Classification of Enzyme Family in Protein Annotation. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2009. Lecture Notes in Computer Science(), vol 5676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03223-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-03223-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03222-6
Online ISBN: 978-3-642-03223-3
eBook Packages: Computer ScienceComputer Science (R0)