Skip to main content

Comparative Methods for Gene Structure Prediction in Homologous Sequences

  • Conference paper
  • First Online:
Algorithms in Bioinformatics (WABI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

Abstract

The increasing number of sequenced genomes motivates the use of evolutionary patterns to detect genes. We present a series of comparative methods for gene finding in homologous prokaryotic or eukaryotic sequences. Based on a model of legal genes and a similarity measure between genes, we find the pair of legal genes of maximum similarity. We develop methods based on genes models and alignment based similarity measures of increasing complexity, which take into account many details of real gene structures, e.g. the similarity of the proteins encoded by the exons. When using a similarity measure based on an exiting alignment, the methods run in linear time. When integrating the alignment and prediction process which allows for more fine grained similarity measures, the methods run in quadratic time. We evaluate the methods in a series of experiments on synthetic and real sequence data, which show that all methods are competitive but that taking the similarity of the encoded proteins into account really boost the performance.

Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).

Bioinformatics Research Center (BiRC), www.birc.dk, funded by Aarhus University Research Fundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Bafna and D. H. Huson. The conserved exon method for gene finding. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 3–12, 2000.

    Google Scholar 

  2. S. Batzolou, L. Pachter, J. P. Mesirov, B. Berger, and E. S. Lander. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Research, 10:950–958, 2000.

    Article  Google Scholar 

  3. P. Blayo, P. Rouzé, and M.-F. Sagot. Orphan gene finding-an exon assembly approach. Unpublished manuscript, 1999.

    Google Scholar 

  4. S. Brunak, J. Engelbrecht, and S. Knudsen. Prediction of human mRNA donor and acceptor sites from the DNA sequence. Journal of Molecular Biology, 220:49–65, 1991.

    Article  Google Scholar 

  5. C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, (268):78–94, 1997.

    Google Scholar 

  6. M. Burset and R. Guigó. Evaluation of gene structure prediction programs. Genomics, 34:353–367, 1996.

    Article  Google Scholar 

  7. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, chapter 1–6. Cambridge University Press, 1998.

    Google Scholar 

  8. M. S. Gelfand, A. A. Mironov, and P. A. Pevzner. Gene recognition via spliced sequence alignment. Proceedings of the National Academy of Science of the USA, 93:9061–9066, 1996.

    Google Scholar 

  9. O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.

    Article  Google Scholar 

  10. J. Hein. An algorithm combining DNA and protein alignment. Journal of Theoretical Biology, 167:169–174, 1994.

    Article  Google Scholar 

  11. J. Hein and J. Støvlbæk. Combined DNA and protein alignment. In Methods in Enzymology, volume 266, pages 402–418. Academic Press, 1996.

    Article  Google Scholar 

  12. D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communication of the ACM, 18(6):341–343, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  13. I. Korf, P. Flicek, D. Duan, and M. R. Brent. Integrating genomic homology into gene structure prediction. Bioinformatics, 17:140–148, 2001.

    Google Scholar 

  14. A. Krogh. A hidden Markov model that finds genes in e. coli DNA. Nucleic Acids Research, 22:4768–4778, 1994.

    Article  Google Scholar 

  15. L. Milanesi and I. Rogozin. Prediction of human gene structure. In Guide to Human Genome Computing, chapter 10. Academic Press Limited, 2nd edition, 1998.

    Google Scholar 

  16. L. Pachter, M. Alexandersson, and S. Cawley. Applications of generalized pair hidden Markov models to alignment and gene finding problems. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB), pages 241–248, 2001.

    Google Scholar 

  17. C. N. S. Pedersen and T. Scharling. Comparative methods for gene structure prediction in homologous sequences. Technical Report RS-02-29, BRICS, June 2002.

    Google Scholar 

  18. J. S. Pedersen and J. Hein. Gene finding with hidden Markov model of genome structure and evolution. Unpublished manuscript, submitted to Bioinformatics.

    Google Scholar 

  19. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195–197, 1981.

    Article  Google Scholar 

  20. Z. Yang. Phylogenetic Analysis by Maximum Likelihood (PAML). University College London, 3.0 edition, may 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pedersen, C.N., Scharling, T. (2002). Comparative Methods for Gene Structure Prediction in Homologous Sequences. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-45784-4_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44211-0

  • Online ISBN: 978-3-540-45784-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics