Skip to main content

Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences

  • Conference paper
Advances in Informatics (PCI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3746))

Included in the following conference series:

Abstract

The prediction of the Translation Initiation Site (TIS) in a genomic sequence is an important issue in biological research. Although several methods have been proposed to deal with this problem, there is a great potential for the improvement of the accuracy of these methods. Due to various reasons, including noise in the data as well as biological reasons, TIS prediction is still an open problem and definitely not a trivial task. In this paper we follow a three-step approach in order to increase TIS prediction accuracy. In the first step, we use a feature generation algorithm we developed. In the second step, all the candidate features, including some new ones generated by our algorithm, are ranked according to their impact to the accuracy of the prediction. Finally, in the third step, a classification model is built using a number of the top ranked features. We experiment with various feature sets, feature selection methods and classification algorithms, compare with alternative methods, draw important conclusions and propose improved models with respect to prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohen, W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 80–89. Morgan Kaufmann, Lake Tahoe (1995)

    Google Scholar 

  2. GenBank Overview, http://www.ncbi.nlm.nih.gov/Genbank/index.html

  3. Hatzigeorgiou, A.: Translation Initiation Start Prediction in Human cDNAs with High Accuracy. Bioinformatics 18(2), 343–350 (2002)

    Article  Google Scholar 

  4. John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  5. Kent Ridge Bio-medical Data Set Repository, http://sdmc.i2r.a-star.edu.sg/rp/

  6. Kozak, M.: An Analysis of 5’-Noncoding Sequences from 699 Vertebrate Messenger RNAs. Nucleic Acids Research 15(20), 8125–8148 (1987)

    Article  Google Scholar 

  7. Kozak, M.: The Scanning Model for Translation: An Update. The Journal of Cell Biology 108(2), 229–241 (1989)

    Article  Google Scholar 

  8. Kozak, M.: Initiation of Translation in Prokaryotes and Eukaryotes. Gene 234(2), 187–208 (1999)

    Article  Google Scholar 

  9. Kozak, M., Shatkin, A.J.: Migration of 40 S Ribosomal Subunits on Messenger RNA in the Presence of Edeine. Journal of Biological Chemistry 253(18), 6568–6577 (1978)

    Google Scholar 

  10. Liu, H., Han, H., Li, J., Wong, L.: Using Amino Acid Patterns to Accurately Predict Translation Initiation Sites. Silico Biology 4(3), 255–269 (2004)

    Google Scholar 

  11. Liu, H., Wong, L.: Data Mining Tools for Biological Sequences. Journal of Bioinformatics and Computational Biology 1(1), 139–168 (2003)

    Article  Google Scholar 

  12. Nishikawa, T., Ota, T., Isogai, T.: Prediction whether a Human cDNA Sequence Contains Initiation Codon by Combining Statistical Information and Similarity with Protein Sequences. Bioinformatics 16(11), 960–967 (2000)

    Article  Google Scholar 

  13. Pedersen, A.G., Nielsen, H.: Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233. AAAI Press, Menlo Park (1997)

    Google Scholar 

  14. Peri, S., Pandey, A.: A Reassessment of the Translation Initiation Codon in Vertebrates. Trends in Genetics 17(12), 685–687 (2001)

    Article  Google Scholar 

  15. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  16. Salamov, A.A., Nishikawa, T., Swindells, M.B.: Assessing Protein Coding Region Integrity in cDNA Sequencing Projects. Bioinformatics 14(5), 384–390 (1998)

    Article  Google Scholar 

  17. Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the ’Perceptron’ Algorithm to Distinguish Translational Initiation Sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)

    Article  Google Scholar 

  18. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  19. Zeng, F., Yap, H., Wong, L.: Using Feature Generation and Feature Selection for Accurate Prediction of Translation Initiation Sites. In: Proceedings of the 13th International Conference on Genome Informatics, Tokyo, Japan, pp. 192–200 (2002)

    Google Scholar 

  20. Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering Support Vector Machine Kernels that Recognize Translation Initiation Sites. Bioinformatics 16(9), 799–807 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tzanis, G., Berberidis, C., Alexandridou, A., Vlahavas, I. (2005). Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_40

Download citation

  • DOI: https://doi.org/10.1007/11573036_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29673-7

  • Online ISBN: 978-3-540-32091-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics