Skip to main content

Data Structures for Genome Annotation, Alternative Splicing, and Validation

  • Conference paper
Data Integration in the Life Sciences (DILS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Included in the following conference series:

  • 526 Accesses

Abstract

To establish a clean basis for studying alternative splicing and gene regulation in life science projects, a powerful data modeling and also a strict validation procedure for assigning levels of reliability to given gene models is essential. One common problem of public genome databases are insufficiently organized and linked description data, which make it difficult to study relations of the alternative isoforms of a gene that are relevant for medi cine and plant genome research. This is a severe obstacle for the integration of biological data and motivated us to establish a new modeling instance and that we call splice template or sTMP. Every sTMP has a unique splicing pattern, but the length of the first and the last exon remains undefined. This allows to model different gene isoforms with the same splicing pattern. By utilizing this more fine-grained data structure, many cases of plurivalent mRNA-CDS relations are uncovered. There are more than 3,000 extra CDSs in the human genome compatible with the categories sTMP, mRNA and CDS, which exceed the classical one-to-one relations of mRNAs and CDSs. In one case, 11 extra CDSs are compatible with one mRNA. Crosslinks between mRNAs derived from different sTMPs leading to the same CDS are now accessible as well as disease-related ruptures in UTR regions. This allows discovering and validating disease and tissue specific differences in alternative splicing, gene expression and regulation. Another problem in public databases is a too much relaxed standard for labeling genes “confirmed by ESTs and full-length-cDNAs.” We provide a pipeline that handles gene annotations from different sources, integrates them into complex gene models and assigns strict validation tags, constrained by a local low-error model for the alignments of genome annotation and transcripts. The data structures are being implemented and made publicly available at the Plant Data Warehouse of the Bioinformatics Center Gatersleben-Halle (http://portal.bic-gh.de/sTMP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Haas, B.J., Volfovsky, N., Town, C.D., Troukhan, M., Alexandrov, N., Feldmann, K.A., Flavell, R.B., White, O., Salzberg, S.L.: Full-length messenger RNA sequences greatly improve genome annotation. Genome Biology 2002 3(6), 1–12 (2002)

    Google Scholar 

  2. EnsEMBL/UCSC Golden Path gene annotation, http://genome.ucsc.edu/goldenPath/

  3. TIGR, The Arabidopsis thaliana genome TIGR/NCBI revision 5.0 from (February 19, 2004) (2004), http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=3702

  4. NCBI (2004-2006), http://www.ncbi.nlm.nih.gov/

  5. Schell, T., Kulozik, A., Hentze1, M.W.: Integration of splicing, transport and translation to achieve mRNA quality control by the nonsense-mediated decay pathway. Genome Biology (2002), doi:10.1186/gb-2002-3-3-reviews1006

    Google Scholar 

  6. Scottish Crop Research Institute. Computational Biology (snoRNAs) (2004), http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/introduction

  7. Hiller, M., Huse, K., Szafranski, K., Jahn, N., Hampe, J., Schreiber, S., Backofen, R., Platzer, M.: Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nature Genetics 36, 1255–1257 (2004)

    Article  Google Scholar 

  8. Thanaraj, T.A., Stamm, S., Clark, F., Riethoven, J.-J., Le Texier, V., Muilu, J.: ASD: the Alternative Splicing Database Nucleic Acids Research 32(Database issue), 2004, pp. D64–D69 (2004-2005)

    Google Scholar 

  9. Usuka, J., Zhu, W., Brendel, V.: Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16, 203–211 (2000)

    Article  Google Scholar 

  10. Kent, W.J.: BLAT—The BLAST-Like Alignment Tool. Gen. Res. 12, 656–664 (2002)

    MathSciNet  Google Scholar 

  11. Kleffe, J., Möller, F., Wessel, R., Wittig, B.: Identification of perfect matches in large sets of sequences. (ClustDB) (submitted, 2006)

    Google Scholar 

  12. Grosse, I., Funke, T., Kuenne, C., Neumann, S., Stephanik, A., Thiel, T., Weise, S.: Integrative Datenanalyse mit dem Plant Data Warehouse Vorträge für Pflanzenzüchtung, vol. 70, pp. 50–53 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mielordt, S., Grosse, I., Kleffe, J. (2006). Data Structures for Genome Annotation, Alternative Splicing, and Validation. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_11

Download citation

  • DOI: https://doi.org/10.1007/11799511_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics