Skip to main content

High Efficiency on Prediction of Translation Initiation Site (TIS) of RefSeq Sequences

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4643))

Included in the following conference series:

Abstract

An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Smote (Synthetic Minority Over-sampling Technique). Smote was used because the average imbalance has a positive/negative pattern ratio of around 1:28 for the databases used in this work. As a result we have attained accuracy, precision, sensitivity and specificity values of 99% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., Muller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites. In: Proc. German Conference on Bioinformatics ’99, pp. 37–43 (1999)

    Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence and Research 16, 321–357 (2002), Disponível em citeseer.ist.psu.edu/chawla02smote.html

    Google Scholar 

  3. Pedersen, A.G., Nielsen, H.: Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. In: Proc. 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233 (1997)

    Google Scholar 

  4. Stormo, G.D., Schneider, T.D., Gold, L.M.: Characterization of translational Initiation sites. E. coli. Nucleic Acid Res. 10, 2971–2996 (1982)

    Article  Google Scholar 

  5. Haykin, Simon: Redes Neurais: princípios e prática. Bookman (2001)

    Google Scholar 

  6. Kozak, M.: Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic. Acids Research 12, 857–872 (1984)

    Article  Google Scholar 

  7. Kozak, M.: An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic. Acids Research 15, 8125–8148 (1987)

    Article  Google Scholar 

  8. Kozak, M.: The scanning model for translation: an update. J. Cell. Biol. 108, 229–241 (1989)

    Article  Google Scholar 

  9. Hatzigeorgiou, A.G.: Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics 18, 343–350 (2002)

    Article  Google Scholar 

  10. Benson, D., Boguski, M., Lipman, D., Ostell, J.: Genbank. Nucleic Acids Research. 25, 1–6 (1997)

    Article  Google Scholar 

  11. Pruitt, K.D., Maglott, D.R.: Refseq and locuslink: NCBI Gene-centered resources. Nucleic Acids Research 29, 137–140 (2001)

    Article  Google Scholar 

  12. Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., Muller, K.-R.: Engineering support vector machine kernels that recognize translation Initiation sites. Bioinformatics 16, 799–807 (2000)

    Article  Google Scholar 

  13. Zeng, F., Yap, R.H., Wong, L.: Using feature generation and feature selection for accurate prediction of translation initiation sites. Genome Informatics Ser Workshop Genome Informatics 13, 192–200 (2002)

    Google Scholar 

  14. Liu, H., Han, H., Li, J., Wong, L.: Using amino acid patterns to accurately predict translation initiation sites. Silico Biology 4, 0022 (2004)

    Google Scholar 

  15. Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: Proceedings of the Eighth International Conference on Research in Computational Molecular Biology, San Diego, California, USA, pp. 262–271 (2004)

    Google Scholar 

  16. Tzanis, G., Berberidis, C., Alexandridou, A., Vlahavas, I.: Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 11–13. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Tzanis, G., Berberidis, C., Vlahavas, I.: A Novel Data Mining Approach for the Accurate Prediction of Translation Initiation Sites. In: Maglaveras, N., et al. (eds.) 7th International Symposium on Biological and Medical Data Analysis, Thessaloniki, Greece, pp. 92–103. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Tzanis, G., Vlahavas, I.: Prediction of Translation Initiation Sites Using Classifier Selection. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 367–377. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Carvalho, B.P.R., Almeida, M.B., Braga, A.P.: Support Vector Machines - um estudo sobre técnicas de treinamento. Technical Report Monogra_a interna no.3, Universidade Federal de Minas Gerais, Belo Horizonte, MG (2002)

    Google Scholar 

  20. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learing Theory, pp. 144–152 (1992)

    Google Scholar 

  21. Burbidge, R., Buxton, B.: An introduction to support vector machines for data mining. In: Sheppee, M. (ed.) Keynote Papers, Young OR12, University of Nottingham, 3.15 Operational Research Society: Operational Research Society (2001)

    Google Scholar 

  22. Scholkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Muller, K.R., Ratsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10, 1000–1017 (1999)

    Article  Google Scholar 

  23. Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT-Press, Cambridge (1999), http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.pdf

    Google Scholar 

  24. Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI) (1995)

    Google Scholar 

  25. Agarwal, P., Bafna, V.: The ribosome scanning model for translation initiation for gene prediction and full-length cDNA detection. In: Proc. 5th International Conference on Intelligent Systems for Molecular Biology, pp. 2–7 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marie-France Sagot Maria Emilia M. T. Walter

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nobre, C.N., Ortega, J.M., de Pádua Braga, A. (2007). High Efficiency on Prediction of Translation Initiation Site (TIS) of RefSeq Sequences. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73731-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73730-8

  • Online ISBN: 978-3-540-73731-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics