Feature Selection for Translation Initiation Site Recognition

de Haro-García, Aida; Pérez-Rodríguez, Javier; García-Pedrajas, Nicolás

doi:10.1007/978-3-642-21827-9_37

Aida de Haro-García²²,
Javier Pérez-Rodríguez²² &
Nicolás García-Pedrajas²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6704))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1641 Accesses
2 Citations

Abstract

Translation initiation site (TIS) recognition is one of the first steps in gene structure prediction, and one of the common components in any gene recognition system. Many methods have been described in the literature to identify TIS in transcripts such as mRNA, EST and cDNA sequences. However, the recognition of TIS in DNA sequences is a far more challenging task, and the methods described so far for transcripts achieve poor results in DNA sequences. From the point of view of Machine Learning, this problem has two distinguishing characteristics: it is class imbalanced and has many features. In this work, we deal with the latter of these two characteristics.

We present a study of the relevance of the different features, the nucleotides that form the sequences, used for recognizing TIS by means of feature selection techniques. We found that the importance of each base position depends on the type of organism. The feature selection process is used to obtain a subset of features for the sequence which is able to improve the classification accuracy of the recognizer. Our results using sequences from human genome, Arabidopsis thaliana and Ustilago maydis show the usefulness of the proposed approach.

This work has been financed in part by the Excellence in Research Project P07-TIC-2682 of the Junta de Andalucía.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barandela, R., Sánchez, J.L., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)
Article Google Scholar
García-Pedrajas, N., Ortiz-Boyer, D., García-Pedrajas, M.D., Fyfe, C.: Class imbalance methods for translation initiation site recognition. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS (LNAI), vol. 6096, pp. 327–336. Springer, Heidelberg (2010)
Chapter Google Scholar
García-Pedrajas, N., Pérez-Rodríguez, J., García-Pedrajas, M., Ortiz-Boyer, D., Fyfe, C.: Class imbalance methods for translation initiation site recognition in dna sequences. Knowledge Based Systems (2010) (submitted)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30, 195–215 (1998)
Article Google Scholar
Liu, H., Han, H., Li, J., Wong, L.: Using amino acids patterns to accurately predict translation initiation sites. Silico Biology 4, 255–269 (2004)
Google Scholar
Narendra, P., Fukunaga, K.: Branch, and bound algorithm for feature subset selection. IEEE Transactions Computer C-26(9), 917–922 (1977)
Article MATH Google Scholar
Saeys, Y., Abeel, T., Degroeve, S., de Peer, Y.V.: Translation initiation site prediction on a genomic scale: beauty in simplicity. Bioinformatics 23, 418–423 (2007)
Article Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378 (2007)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Numerical Analysis, University of Córdoba, Spain
Aida de Haro-García, Javier Pérez-Rodríguez & Nicolás García-Pedrajas

Authors

Aida de Haro-García
View author publications
You can also search for this author in PubMed Google Scholar
Javier Pérez-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás García-Pedrajas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Inforamtion Science, Center for Science and Technology, Syracuse University, 13244-4100, Syracuse, NY, USA
Kishan G. Mehrotra & Chilukuri K. Mohan &
Department of Electrical Engineering and Computer Science, Syracuse University, 13244, Syracuse, NY, USA
Jae C. Oh & Pramod K. Varshney &
Department of Computer Science, Texas State University San Marcos, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Haro-García, A., Pérez-Rodríguez, J., García-Pedrajas, N. (2011). Feature Selection for Translation Initiation Site Recognition. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21827-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-21827-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21826-2
Online ISBN: 978-3-642-21827-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics