Prediction of Enzyme Class by Using Reactive Motifs Generated from Binding and Catalytic Sites

Liewlom, Peera; Rakthanmanon, Thanawin; Waiyamai, Kitsana

doi:10.1007/978-3-540-73871-8_41

Peera Liewlom²⁴,
Thanawin Rakthanmanon²⁴ &
Kitsana Waiyamai²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2194 Accesses
1 Citations

Abstract

The purpose of this research is to search for motifs directly at binding and catalytic sites called reactive motifs, and then to predict enzyme functions from the discovered reactive motifs. The main challenge is that the data of binding, or catalytic sites is only available in the range 3.34% of all enzymes, and many of each data provides only one sequence record. The other challenge is the complexity of motif combinations to predict enzyme functions.

In this paper, we introduce a unique process which combines statistics with bio-chemistry background to determine reactive motifs. It is consisting of block scan filter, mutation control, and reactive site-group define procedures. The purpose of block scan filter is to alter each 1-sequence record of binding or catalytic site, using similarity score, to produce quality blocks. These blocks are input to mutation control, where in each position of the sequences, amino acids are analyzed an extended to determine complete substitution group. Output of the mutation control step is a set of motifs for each 1-sequence record input. These motifs are then grouped using the reactive site-group define procedure to produce reactive motifs. Those reactive motifs together with known enzyme sequence dataset are used as the input to C4.5 learning algorithm, to obtain an enzyme prediction model. The accuracy of this model is checked against testing dataset. At 235 enzyme function class, the reactive motifs yield the best prediction result with C4.5 at 72.58%, better than PROSITE motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bairoch, A.: PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 19, 2241–2245 (1991)
Article Google Scholar
Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct. Funct. Genet. 9, 56–68 (1991)
Article Google Scholar
Huang, J.Y., Brutlag, D.L.: The EMOTIF database. Nucleic Acids Res. 29, 202–204 (2001)
Article Google Scholar
Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein structure comparison and structure patterns. Journal of Computational Biology 7(5), 685–716 (2000)
Article Google Scholar
Bennett, S.P., Lu, L., Brutlag, D.L.: 3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence. Nucleic Acids Res. 31, 3328–3332 (2003)
Article Google Scholar
Henikoff, S., Henikoff, J.G.: Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19, 6565–6572 (1991)
Article Google Scholar
Barton, G.J.: Protein multiple sequence alignment and flexible pattern matching. Methods Enzymol. 183, 403–428 (1990)
Article Google Scholar
Taylor, W.R.: The classification of amino acid conservation. J. Theor. Biol. 119(2), 205–218 (1986)
Article Google Scholar
Wu, T.D., Brutlag, D.L.: Discovering Empirically Conserved Amino Acid Substitution Groups in Databases of Protein Families. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. (4), pp. 230–240 (1996)
Google Scholar
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000)
Article Google Scholar
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB): Enzyme Nomenclature. Recommendations 1992. Academic Press (1992)
Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Article Google Scholar
Smith, H.O., Annau, T.M., Chandrasegaran, S.: Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. 87(2), 826–830 (1990)
Article Google Scholar
Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, Springer, Heidelberg (2005)
Chapter Google Scholar
Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, I.H.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 2479–2481 (2004)
Article Google Scholar
Liewlom, P., Rakthanmanon, M.P., Waiyamai, K.: Concept Lattice-based Mutation Control for Reactive Motif Discovery. DAKDL technical report, Faculty of Engineering, Kasetsart University, Thailand
Google Scholar

Download references

Author information

Authors and Affiliations

Data Analysis and Knowledge Discovery Laboratory (DAKDL), Computer Engineering Department, Engineering Faculty, Kasetsart University, Bangkok, Thailand
Peera Liewlom, Thanawin Rakthanmanon & Kitsana Waiyamai

Authors

Peera Liewlom
View author publications
You can also search for this author in PubMed Google Scholar
Thanawin Rakthanmanon
View author publications
You can also search for this author in PubMed Google Scholar
Kitsana Waiyamai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Calgary , Calgary, AB, Canada
Reda Alhajj
School of Computer Science and Technology , Harbin Institute of Technology, Harbin, China
Hong Gao
School of Computer Science and Technology , Harbin Institute of Technology , Harbin, China
Jianzhong Li
School of Information Technology and Electronic Engineering , The University of Queensland , Queensland, Australia
Xue Li
Department of Computing Science , University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liewlom, P., Rakthanmanon, T., Waiyamai, K. (2007). Prediction of Enzyme Class by Using Reactive Motifs Generated from Binding and Catalytic Sites. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-73871-8_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics