Abstract
The purpose of this research is to search for motifs directly at binding and catalytic sites called reactive motifs, and then to predict enzyme functions from the discovered reactive motifs. The main challenge is that the data of binding, or catalytic sites is only available in the range 3.34% of all enzymes, and many of each data provides only one sequence record. The other challenge is the complexity of motif combinations to predict enzyme functions.
In this paper, we introduce a unique process which combines statistics with bio-chemistry background to determine reactive motifs. It is consisting of block scan filter, mutation control, and reactive site-group define procedures. The purpose of block scan filter is to alter each 1-sequence record of binding or catalytic site, using similarity score, to produce quality blocks. These blocks are input to mutation control, where in each position of the sequences, amino acids are analyzed an extended to determine complete substitution group. Output of the mutation control step is a set of motifs for each 1-sequence record input. These motifs are then grouped using the reactive site-group define procedure to produce reactive motifs. Those reactive motifs together with known enzyme sequence dataset are used as the input to C4.5 learning algorithm, to obtain an enzyme prediction model. The accuracy of this model is checked against testing dataset. At 235 enzyme function class, the reactive motifs yield the best prediction result with C4.5 at 72.58%, better than PROSITE motifs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bairoch, A.: PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 19, 2241–2245 (1991)
Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct. Funct. Genet. 9, 56–68 (1991)
Huang, J.Y., Brutlag, D.L.: The EMOTIF database. Nucleic Acids Res. 29, 202–204 (2001)
Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein structure comparison and structure patterns. Journal of Computational Biology 7(5), 685–716 (2000)
Bennett, S.P., Lu, L., Brutlag, D.L.: 3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence. Nucleic Acids Res. 31, 3328–3332 (2003)
Henikoff, S., Henikoff, J.G.: Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19, 6565–6572 (1991)
Barton, G.J.: Protein multiple sequence alignment and flexible pattern matching. Methods Enzymol. 183, 403–428 (1990)
Taylor, W.R.: The classification of amino acid conservation. J. Theor. Biol. 119(2), 205–218 (1986)
Wu, T.D., Brutlag, D.L.: Discovering Empirically Conserved Amino Acid Substitution Groups in Databases of Protein Families. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. (4), pp. 230–240 (1996)
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000)
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB): Enzyme Nomenclature. Recommendations 1992. Academic Press (1992)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Smith, H.O., Annau, T.M., Chandrasegaran, S.: Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. 87(2), 826–830 (1990)
Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, Springer, Heidelberg (2005)
Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, I.H.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 2479–2481 (2004)
Liewlom, P., Rakthanmanon, M.P., Waiyamai, K.: Concept Lattice-based Mutation Control for Reactive Motif Discovery. DAKDL technical report, Faculty of Engineering, Kasetsart University, Thailand
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Liewlom, P., Rakthanmanon, T., Waiyamai, K. (2007). Prediction of Enzyme Class by Using Reactive Motifs Generated from Binding and Catalytic Sites. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)