Skip to main content

Human Genome Annotation

(Invited Keynote Talk)

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6053))

Included in the following conference series:

  • 758 Accesses

Abstract

A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. http://pseudogene.org

  2. http://GenomeTECH.Gersteinlab.org

  3. Balasubramanian, S., Zheng, D., Liu, Y.J., Fang, G., Frankish, A., Carriero, N., Robilotto, R., Cayting, P., Gerstein, M.: Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biol. 10, R2 (2009)

    Article  Google Scholar 

  4. Du, J., Bjornson, R.D., Zhang, Z.D., Kong, Y., Snyder, M., Gerstein, M.B.: Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Comput. Biol. 5, e1000432 (2009)

    Article  MathSciNet  Google Scholar 

  5. Kim, P.M., Lam, H.Y., Urban, A.E., Korbel, J.O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., Gerstein, M.B.: Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 18, 1865–1874 (2008)

    Article  Google Scholar 

  6. Korbel, J.O., Abyzov, A., Mu, X.J., Carriero, N., Cayting, P., Zhang, Z., Snyder, M., Gerstein, M.B.: PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, 23 (2009)

    Article  Google Scholar 

  7. Lam, H.Y., Khurana, E., Fang, G., Cayting, P., Carriero, N., Cheung, K.H., Gerstein, M.B.: Pseudofam: the pseudogene families database. Nucleic Acids Res. 37, D738–D743 (2009)

    Article  Google Scholar 

  8. Lam, H.Y., Mu, X.J., Stütz, A.M., Tanzer, A., Cayting, P.D., Snyder, M., Kim, P.M., Korbel, J.O., Gerstein, M.B.: Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010)

    Article  Google Scholar 

  9. Rozowsky, J., Euskirchen, G., Auerbach, R.K., Zhang, Z.D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M.B.: PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009)

    Article  Google Scholar 

  10. Snyder, M., Weissman, S., Gerstein, M.: Personal phenotypes to go with personal genomes. Mol. Syst. Biol. 5, 273 (2009)

    Article  Google Scholar 

  11. Wang, L.Y., Abyzov, A., Korbel, J.O., Snyder, M., Gerstein, M.: MSB: A mean-shift-based approach for the analysis of structural variation in the genome. Genome Res. 19, 106–117 (2009)

    Article  Google Scholar 

  12. Zhang, Z.D., Paccanaro, A., Fu, Y., Weissman, S., Weng, Z., Chang, J., Snyder, M., Gerstein, M.B.: Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 17, 787–797 (2007)

    Article  Google Scholar 

  13. Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Choo, S.W., Lu, Y., Denoeud, F., Antonarakis, S.E., Snyder, M., Ruan, Y., Wei, C.L., Gingeras, T.R., Guigo, R., Harrow, J., Gerstein, M.B.: Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 17, 839–851 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gerstein, M. (2010). Human Genome Annotation. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds) Bioinformatics Research and Applications. ISBRA 2010. Lecture Notes in Computer Science(), vol 6053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13078-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13078-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13077-9

  • Online ISBN: 978-3-642-13078-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics