Skip to main content

Using Multiple Alignments to Improve Gene Prediction

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Abstract

The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN has the ability to model dependencies between the aligned sequences, context-dependent substitution rates, and insertions and deletions in the sequences. An implementation of N-SCAN was created and used to generate predictions for the entire human genome. An analysis of the predictions reveals that N-SCAN’s predictive accuracy in human exceeds that of all previously published whole-genome de novo gene predictors. In addition, predictions were generated for the genome of the fruit fly Drosophila melanogaster to demonstrate the applicability of N-SCAN to invertebrate gene prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The MGC Project Team. The status, quality, and expansion of the NIH full-length cDNA project: The Mammalian Gene Collection (MGC). Genome Res. 14, 2121–2127 (2004)

    Google Scholar 

  2. Waterston, et al.: Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)

    Article  Google Scholar 

  3. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)

    Article  Google Scholar 

  4. Alexandersson, M., Cawley, S., Pachter, L.: SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13, 496–502 (2003)

    Article  Google Scholar 

  5. Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W., Guigo, R.: Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003)

    Article  Google Scholar 

  6. Korf, I., Flicek, P., Duan, D., Brent, M.R.: Integrating genomic homology into gene structure prediction. Bioinformatics 17(suppl. 1), S140–S148 (2001)

    Google Scholar 

  7. Flicek, P., Keibler, E., Hu, P., Korf, I., Brent, M.R.: Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map. Genome Res. 13, 46–54 (2003)

    Article  Google Scholar 

  8. Tenney, A.E., Brown, R.H., Vaske, C., Lodge, J.K., Doering, T.L., Brent, M.R.: Gene prediction and verification in a compact genome with numerous small introns. Genome Res. (2004) (in press)

    Google Scholar 

  9. Siepel, A.C., Haussler, D.: Computational identification of evolutionary conserved exons. In: RECOMB 2004 (2004)

    Google Scholar 

  10. McAuliffe, J.D., Pachter, L., Jordan, M.I.: Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Technical Report 647, Department of Statistics, University of California, Berkeley (2003)

    Google Scholar 

  11. Pedersen, J.S., Hein, J.: Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19, 219–227 (2003)

    Article  Google Scholar 

  12. Siepel, A., Haussler, D.: Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol. 21, 468–488 (2004)

    Article  Google Scholar 

  13. Felsenstein, J.: Evolutionary trees from DNA sequences. J. Mol. Evol. 17, 368–376 (1981)

    Article  Google Scholar 

  14. Lió, P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res. 8, 1233–1244 (1998)

    Google Scholar 

  15. Bulmer, M.: Neighboring base effects on substitution rates in pseudogenes. Mol. Biol. Evol. 3, 322–329 (1986)

    Google Scholar 

  16. Brown, R.H., Gross, S.S., Brent, M.R.: Begin at the beginning: predicting genes with 5’ UTRs (2005) (submitted)

    Google Scholar 

  17. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at UCSC. Genome Res. 12, 996–1006 (2003)

    Google Scholar 

  18. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smith, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)

    Article  Google Scholar 

  19. SGP2 home page, http://genome.imim.es/software/sgp2

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gross, S.S., Brent, M.R. (2005). Using Multiple Alignments to Improve Gene Prediction. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_29

Download citation

  • DOI: https://doi.org/10.1007/11415770_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25866-7

  • Online ISBN: 978-3-540-31950-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics