Skip to main content

Prediction of Enzyme Class by Using Reactive Motifs Generated from Binding and Catalytic Sites

  • Conference paper
Advanced Data Mining and Applications (ADMA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Included in the following conference series:

Abstract

The purpose of this research is to search for motifs directly at binding and catalytic sites called reactive motifs, and then to predict enzyme functions from the discovered reactive motifs. The main challenge is that the data of binding, or catalytic sites is only available in the range 3.34% of all enzymes, and many of each data provides only one sequence record. The other challenge is the complexity of motif combinations to predict enzyme functions.

In this paper, we introduce a unique process which combines statistics with bio-chemistry background to determine reactive motifs. It is consisting of block scan filter, mutation control, and reactive site-group define procedures. The purpose of block scan filter is to alter each 1-sequence record of binding or catalytic site, using similarity score, to produce quality blocks. These blocks are input to mutation control, where in each position of the sequences, amino acids are analyzed an extended to determine complete substitution group. Output of the mutation control step is a set of motifs for each 1-sequence record input. These motifs are then grouped using the reactive site-group define procedure to produce reactive motifs. Those reactive motifs together with known enzyme sequence dataset are used as the input to C4.5 learning algorithm, to obtain an enzyme prediction model. The accuracy of this model is checked against testing dataset. At 235 enzyme function class, the reactive motifs yield the best prediction result with C4.5 at 72.58%, better than PROSITE motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bairoch, A.: PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 19, 2241–2245 (1991)

    Article  Google Scholar 

  2. Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct. Funct. Genet. 9, 56–68 (1991)

    Article  Google Scholar 

  3. Huang, J.Y., Brutlag, D.L.: The EMOTIF database. Nucleic Acids Res. 29, 202–204 (2001)

    Article  Google Scholar 

  4. Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein structure comparison and structure patterns. Journal of Computational Biology 7(5), 685–716 (2000)

    Article  Google Scholar 

  5. Bennett, S.P., Lu, L., Brutlag, D.L.: 3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence. Nucleic Acids Res. 31, 3328–3332 (2003)

    Article  Google Scholar 

  6. Henikoff, S., Henikoff, J.G.: Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19, 6565–6572 (1991)

    Article  Google Scholar 

  7. Barton, G.J.: Protein multiple sequence alignment and flexible pattern matching. Methods Enzymol. 183, 403–428 (1990)

    Article  Google Scholar 

  8. Taylor, W.R.: The classification of amino acid conservation. J. Theor. Biol. 119(2), 205–218 (1986)

    Article  Google Scholar 

  9. Wu, T.D., Brutlag, D.L.: Discovering Empirically Conserved Amino Acid Substitution Groups in Databases of Protein Families. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. (4), pp. 230–240 (1996)

    Google Scholar 

  10. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000)

    Article  Google Scholar 

  11. Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB): Enzyme Nomenclature. Recommendations 1992. Academic Press (1992)

    Google Scholar 

  12. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    Article  Google Scholar 

  13. Smith, H.O., Annau, T.M., Chandrasegaran, S.: Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. 87(2), 826–830 (1990)

    Article  Google Scholar 

  14. Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, I.H.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 2479–2481 (2004)

    Article  Google Scholar 

  16. Liewlom, P., Rakthanmanon, M.P., Waiyamai, K.: Concept Lattice-based Mutation Control for Reactive Motif Discovery. DAKDL technical report, Faculty of Engineering, Kasetsart University, Thailand

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Liewlom, P., Rakthanmanon, T., Waiyamai, K. (2007). Prediction of Enzyme Class by Using Reactive Motifs Generated from Binding and Catalytic Sites. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73871-8_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73870-1

  • Online ISBN: 978-3-540-73871-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics