Automatic Classification of Enzyme Family in Protein Annotation

dos Santos, Cássia T.; Bazzan, Ana L. C.; Lemke, Ney

doi:10.1007/978-3-642-03223-3_8

Cássia T. dos Santos²²,
Ana L. C. Bazzan²³ &
Ney Lemke²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5676))

Included in the following conference series:

Brazilian Symposium on Bioinformatics

592 Accesses

Abstract

Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process – thus freeing the specialist to carry out more valuable tasks – has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature

Article Open access 21 September 2018

Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou’s General Pseudo Amino Acid Composition

Article 25 April 2016

The Classification of Protein Domains

References

Bazzan, A.L.C., da Silva, S.C., Engel, P.M., Schroeder, L.F.: Automatic annotation of keywords for proteins related to mycoplasmataceae using machine learning techniques. Bioinformatics 18(S2), S1–S9 (2002)
Google Scholar
BenHur, A., Brutlag, D.: Sequence motifs: highly predictive features of protein function. In: Feature extraction, foundations and applications, pp. 625–643. Springer, Heidelberg (2005)
Google Scholar
Cai, C., Han, L., Ji, Z., Chen, Y.: Enzyme family classification by support vector machines. Proteins: Structure, Function, and Bioinformatics 55(1), 66–76 (2004)
Article CAS Google Scholar
des Jardins, M., Karp, P., Krummenacker, M., Lee, T., Ouzounis, C.: Prediction of enzyme classification from protein sequence without the use of sequence similarity. In: Proceedings of the International Conference on Intelligent Systems Molecular Biology, pp. 92–99 (1997)
Google Scholar
dos Santos, C.T., Bazzan, A.L.C.: Integrating knowledge through cooperative negotiation – A case study in bioinformatics. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds.) AIS-ADM 2005. LNCS, vol. 3505, pp. 277–288. Springer, Heidelberg (2005)
Chapter Google Scholar
Gasteiger, E., Jung, E., Bairoch, A.: Swiss-prot: Connecting biological knowledge via a protein database. Curr. Issues Mol. Biol. 3, 47–55 (2001)
CAS PubMed Google Scholar
Kretschmann, E., Fleischmann, W., Apweiler, R.: Automatic rule generation for protein annotation with the C4. 5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17, 920–926 (2001)
CAS PubMed Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Weinert, W., Lopes, H.: Neural networks for protein classification. Applied Bioinformatics 3(1), 41–48 (2004)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, Universidade de Évora, Portugal
Cássia T. dos Santos
Instituto de Informática / PPGC, Universidade Federal do Rio Grande do Sul, C. P. 15064, 91.501-970, Porto Alegre, RS, Brazil
Ana L. C. Bazzan
Dep. de Física e Biofísica, Instituto de Biociências, UNESP, C.P. 510, 18618-000, Botucatu, SP, Brazil
Ney Lemke

Authors

Cássia T. dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Ana L. C. Bazzan
View author publications
You can also search for this author in PubMed Google Scholar
Ney Lemke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center of Informatics, Av. Prof. Luiz Freire, Federal University of Pernambuco, s/n, Cidade Universitária, PE 50740-540, Recife, Brazil
Katia S. Guimarães
National Library of Medicine, National Institutes of Health, National Center for Biotechnology Information, 8600 Rockville Pike, Building 38A 8S814, Bethesda, MD 20894, USA
Anna Panchenko
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Building 38A 8S814, MD 20894,, Bethesda, USA
Teresa M. Przytycka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, C.T., Bazzan, A.L.C., Lemke, N. (2009). Automatic Classification of Enzyme Family in Protein Annotation. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2009. Lecture Notes in Computer Science(), vol 5676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03223-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-03223-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03222-6
Online ISBN: 978-3-642-03223-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics