An Evolutionary Algorithm for Gene Structure Prediction

Pérez-Rodríguez, Javier; García-Pedrajas, Nicolás

doi:10.1007/978-3-642-21827-9_40

Javier Pérez-Rodríguez²² &
Nicolás García-Pedrajas²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6704))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1659 Accesses
3 Citations

Abstract

Gene recognition, gene structure prediction or gene finding, as all these three and other terms are used, consists of determining which parts of a genomic sequence are coding, and constructing the whole gene from its start site to its stop codon. Gene recognition is one of the most important open problems in Bioinformatics. The process of discovering the putative genes in a genome is called annotation.

There are two basic approaches to gene structure prediction: extrinsic and intrinsic methods. Intrinsic methods are now preferred due to their ability to identify more unknown genes. Gene recognition is a search problem, where many evidence sources are combined in a scoring function that must be maximized to obtain the structure of a probable gene.

In this paper, we propose the first purely evolutionary algorithm in the literature for gene structure prediction. The application of genetic algorithms to gene recognition will open a new field of research where the flexibility of evolutionary computation can be used to account for the complexities of the problem, which are growing as our knowledge of the molecular processes of transcription and translation deepens.

This work has been financed in part by the Excellence in Research Project P07-TIC-2682 of the Junta de Andalucía.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burge, C., Karlin, S.: Prediction of Complete Gene Structures in Human Genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)
Article Google Scholar
Cortés, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Fickett, J.W., Hatzigeorgio, A.G.: Eukaryotic promoter recognition. Genome Research 7, 861–878 (1997)
Article Google Scholar
Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O., Emanuelsson, O., Zhang, Z.D., Weissman, S., Snyder, M.: What is a gene, post encode? History and updated definition. Genome Research 17, 669–681 (2007)
Article Google Scholar
Gross, S.S., Do, C.B., Sirota, M., Batzoglou, S.: Contrast: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biology 16, R269.1–R269.16 (2007)
Google Scholar
Poole II, F.L.P., Gerwe, B.A., Hopkins, R.C., Schut, G.J., Weinberg, M.V., Jenney, F.E., Admas, M.W.W.: Defining genes in the genome of the hyperthermophilic archaeon pyrococcus furiosus. Journal of Bacteriology 187, 7325–7332 (2005)
Article Google Scholar
Knapp, K., Chen, Y.P.P.: An evaluation of contemporary hidden Markov model gene-finders with predicted exon taxonomy. Nucleic Acids Research 35(1), 317–324 (2007)
Article Google Scholar
Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30, 195–215 (1998)
Article Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Lewis, D., Gale, W.: Training text classifiers by uncertainty sampling. In: Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information, New York, USA, pp. 73–79 (1998)
Google Scholar
Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30(19), 4103–4117 (2002)
Article Google Scholar
Saeys, Y., Abeel, T., Degroeve, S., de Peer, Y.V.: Translation initiation site prediction on a genomic scale: beauty in simplicity. Bioinformatics 23, 418–423 (2007)
Article Google Scholar
Siepel, A., Haussler, D.: Computational identification of evolutionarily conserved exons. In: Proceedings of the Eighth International Conference on Research in Computational Molecular Biology, pp. 177–186. ACM Press, New York (2007)
Google Scholar
Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. Journal of Molecular Biology 248, 1–18 (1995)
Article Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378 (2007)
Article MATH Google Scholar
Zhou, Y., Liang, Y., Hu, C., Wang, L., Shi, X.: An artificial neural network method for combining gene prediction based on equitable weights. Neurocomputing 71, 538–543 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Numerical Analysis, University of Córdoba, Spain
Javier Pérez-Rodríguez & Nicolás García-Pedrajas

Authors

Javier Pérez-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás García-Pedrajas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Inforamtion Science, Center for Science and Technology, Syracuse University, 13244-4100, Syracuse, NY, USA
Kishan G. Mehrotra & Chilukuri K. Mohan &
Department of Electrical Engineering and Computer Science, Syracuse University, 13244, Syracuse, NY, USA
Jae C. Oh & Pramod K. Varshney &
Department of Computer Science, Texas State University San Marcos, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez-Rodríguez, J., García-Pedrajas, N. (2011). An Evolutionary Algorithm for Gene Structure Prediction. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21827-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-21827-9_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21826-2
Online ISBN: 978-3-642-21827-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics