Abstract
An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Smote (Synthetic Minority Over-sampling Technique). Smote was used because the average imbalance has a positive/negative pattern ratio of around 1:28 for the databases used in this work. As a result we have attained accuracy, precision, sensitivity and specificity values of 99% on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., Muller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites. In: Proc. German Conference on Bioinformatics ’99, pp. 37–43 (1999)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence and Research 16, 321–357 (2002), Disponível em citeseer.ist.psu.edu/chawla02smote.html
Pedersen, A.G., Nielsen, H.: Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. In: Proc. 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233 (1997)
Stormo, G.D., Schneider, T.D., Gold, L.M.: Characterization of translational Initiation sites. E. coli. Nucleic Acid Res. 10, 2971–2996 (1982)
Haykin, Simon: Redes Neurais: princípios e prática. Bookman (2001)
Kozak, M.: Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic. Acids Research 12, 857–872 (1984)
Kozak, M.: An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic. Acids Research 15, 8125–8148 (1987)
Kozak, M.: The scanning model for translation: an update. J. Cell. Biol. 108, 229–241 (1989)
Hatzigeorgiou, A.G.: Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics 18, 343–350 (2002)
Benson, D., Boguski, M., Lipman, D., Ostell, J.: Genbank. Nucleic Acids Research. 25, 1–6 (1997)
Pruitt, K.D., Maglott, D.R.: Refseq and locuslink: NCBI Gene-centered resources. Nucleic Acids Research 29, 137–140 (2001)
Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., Muller, K.-R.: Engineering support vector machine kernels that recognize translation Initiation sites. Bioinformatics 16, 799–807 (2000)
Zeng, F., Yap, R.H., Wong, L.: Using feature generation and feature selection for accurate prediction of translation initiation sites. Genome Informatics Ser Workshop Genome Informatics 13, 192–200 (2002)
Liu, H., Han, H., Li, J., Wong, L.: Using amino acid patterns to accurately predict translation initiation sites. Silico Biology 4, 0022 (2004)
Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: Proceedings of the Eighth International Conference on Research in Computational Molecular Biology, San Diego, California, USA, pp. 262–271 (2004)
Tzanis, G., Berberidis, C., Alexandridou, A., Vlahavas, I.: Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 11–13. Springer, Heidelberg (2005)
Tzanis, G., Berberidis, C., Vlahavas, I.: A Novel Data Mining Approach for the Accurate Prediction of Translation Initiation Sites. In: Maglaveras, N., et al. (eds.) 7th International Symposium on Biological and Medical Data Analysis, Thessaloniki, Greece, pp. 92–103. Springer, Heidelberg (2006)
Tzanis, G., Vlahavas, I.: Prediction of Translation Initiation Sites Using Classifier Selection. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 367–377. Springer, Heidelberg (2006)
Carvalho, B.P.R., Almeida, M.B., Braga, A.P.: Support Vector Machines - um estudo sobre técnicas de treinamento. Technical Report Monogra_a interna no.3, Universidade Federal de Minas Gerais, Belo Horizonte, MG (2002)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learing Theory, pp. 144–152 (1992)
Burbidge, R., Buxton, B.: An introduction to support vector machines for data mining. In: Sheppee, M. (ed.) Keynote Papers, Young OR12, University of Nottingham, 3.15 Operational Research Society: Operational Research Society (2001)
Scholkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Muller, K.R., Ratsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10, 1000–1017 (1999)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT-Press, Cambridge (1999), http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.pdf
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI) (1995)
Agarwal, P., Bafna, V.: The ribosome scanning model for translation initiation for gene prediction and full-length cDNA detection. In: Proc. 5th International Conference on Intelligent Systems for Molecular Biology, pp. 2–7 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nobre, C.N., Ortega, J.M., de Pádua Braga, A. (2007). High Efficiency on Prediction of Translation Initiation Site (TIS) of RefSeq Sequences. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-73731-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73730-8
Online ISBN: 978-3-540-73731-5
eBook Packages: Computer ScienceComputer Science (R0)