Comparative Methods for Gene Structure Prediction in Homologous Sequences

Pedersen, Christian N.S.; Scharling, Tejs

doi:10.1007/3-540-45784-4_17

Christian N.S. Pedersen⁶ &
Tejs Scharling⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1088 Accesses
1 Citations

Abstract

The increasing number of sequenced genomes motivates the use of evolutionary patterns to detect genes. We present a series of comparative methods for gene finding in homologous prokaryotic or eukaryotic sequences. Based on a model of legal genes and a similarity measure between genes, we find the pair of legal genes of maximum similarity. We develop methods based on genes models and alignment based similarity measures of increasing complexity, which take into account many details of real gene structures, e.g. the similarity of the proteins encoded by the exons. When using a similarity measure based on an exiting alignment, the methods run in linear time. When integrating the alignment and prediction process which allows for more fine grained similarity measures, the methods run in quadratic time. We evaluate the methods in a series of experiments on synthetic and real sequence data, which show that all methods are competitive but that taking the similarity of the encoded proteins into account really boost the performance.

Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).

Bioinformatics Research Center (BiRC), www.birc.dk, funded by Aarhus University Research Fundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

V. Bafna and D. H. Huson. The conserved exon method for gene finding. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 3–12, 2000.
Google Scholar
S. Batzolou, L. Pachter, J. P. Mesirov, B. Berger, and E. S. Lander. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Research, 10:950–958, 2000.
Article Google Scholar
P. Blayo, P. Rouzé, and M.-F. Sagot. Orphan gene finding-an exon assembly approach. Unpublished manuscript, 1999.
Google Scholar
S. Brunak, J. Engelbrecht, and S. Knudsen. Prediction of human mRNA donor and acceptor sites from the DNA sequence. Journal of Molecular Biology, 220:49–65, 1991.
Article Google Scholar
C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, (268):78–94, 1997.
Google Scholar
M. Burset and R. Guigó. Evaluation of gene structure prediction programs. Genomics, 34:353–367, 1996.
Article Google Scholar
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, chapter 1–6. Cambridge University Press, 1998.
Google Scholar
M. S. Gelfand, A. A. Mironov, and P. A. Pevzner. Gene recognition via spliced sequence alignment. Proceedings of the National Academy of Science of the USA, 93:9061–9066, 1996.
Google Scholar
O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.
Article Google Scholar
J. Hein. An algorithm combining DNA and protein alignment. Journal of Theoretical Biology, 167:169–174, 1994.
Article Google Scholar
J. Hein and J. Støvlbæk. Combined DNA and protein alignment. In Methods in Enzymology, volume 266, pages 402–418. Academic Press, 1996.
Article Google Scholar
D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communication of the ACM, 18(6):341–343, 1975.
Article MATH MathSciNet Google Scholar
I. Korf, P. Flicek, D. Duan, and M. R. Brent. Integrating genomic homology into gene structure prediction. Bioinformatics, 17:140–148, 2001.
Google Scholar
A. Krogh. A hidden Markov model that finds genes in e. coli DNA. Nucleic Acids Research, 22:4768–4778, 1994.
Article Google Scholar
L. Milanesi and I. Rogozin. Prediction of human gene structure. In Guide to Human Genome Computing, chapter 10. Academic Press Limited, 2nd edition, 1998.
Google Scholar
L. Pachter, M. Alexandersson, and S. Cawley. Applications of generalized pair hidden Markov models to alignment and gene finding problems. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB), pages 241–248, 2001.
Google Scholar
C. N. S. Pedersen and T. Scharling. Comparative methods for gene structure prediction in homologous sequences. Technical Report RS-02-29, BRICS, June 2002.
Google Scholar
J. S. Pedersen and J. Hein. Gene finding with hidden Markov model of genome structure and evolution. Unpublished manuscript, submitted to Bioinformatics.
Google Scholar
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195–197, 1981.
Article Google Scholar
Z. Yang. Phylogenetic Analysis by Maximum Likelihood (PAML). University College London, 3.0 edition, may 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

BiRC, Department of Computer Science, University of Aarhus, Ny Munkegade, Building 540, 8000, Århus C, DK, Denmark
Christian N.S. Pedersen & Tejs Scharling

Authors

Christian N.S. Pedersen
View author publications
You can also search for this author in PubMed Google Scholar
Tejs Scharling
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IMIM-UPF-CRG, Dr. Aiguader 80, 08003, Barcelona, Spain
Roderic Guigó
Department of Computer Science, University of California, 95616, Davis, CA, USA
Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pedersen, C.N., Scharling, T. (2002). Comparative Methods for Gene Structure Prediction in Homologous Sequences. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_17

Download citation

DOI: https://doi.org/10.1007/3-540-45784-4_17
Published: 10 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics