Human Genome Annotation

Gerstein, Mark

doi:10.1007/978-3-642-13078-6_7

Mark Gerstein²³

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6053))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

758 Accesses

Abstract

A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

http://pseudogene.org
http://GenomeTECH.Gersteinlab.org
Balasubramanian, S., Zheng, D., Liu, Y.J., Fang, G., Frankish, A., Carriero, N., Robilotto, R., Cayting, P., Gerstein, M.: Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biol. 10, R2 (2009)
Article Google Scholar
Du, J., Bjornson, R.D., Zhang, Z.D., Kong, Y., Snyder, M., Gerstein, M.B.: Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Comput. Biol. 5, e1000432 (2009)
Article MathSciNet Google Scholar
Kim, P.M., Lam, H.Y., Urban, A.E., Korbel, J.O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., Gerstein, M.B.: Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 18, 1865–1874 (2008)
Article Google Scholar
Korbel, J.O., Abyzov, A., Mu, X.J., Carriero, N., Cayting, P., Zhang, Z., Snyder, M., Gerstein, M.B.: PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, 23 (2009)
Article Google Scholar
Lam, H.Y., Khurana, E., Fang, G., Cayting, P., Carriero, N., Cheung, K.H., Gerstein, M.B.: Pseudofam: the pseudogene families database. Nucleic Acids Res. 37, D738–D743 (2009)
Article Google Scholar
Lam, H.Y., Mu, X.J., Stütz, A.M., Tanzer, A., Cayting, P.D., Snyder, M., Kim, P.M., Korbel, J.O., Gerstein, M.B.: Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010)
Article Google Scholar
Rozowsky, J., Euskirchen, G., Auerbach, R.K., Zhang, Z.D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M.B.: PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009)
Article Google Scholar
Snyder, M., Weissman, S., Gerstein, M.: Personal phenotypes to go with personal genomes. Mol. Syst. Biol. 5, 273 (2009)
Article Google Scholar
Wang, L.Y., Abyzov, A., Korbel, J.O., Snyder, M., Gerstein, M.: MSB: A mean-shift-based approach for the analysis of structural variation in the genome. Genome Res. 19, 106–117 (2009)
Article Google Scholar
Zhang, Z.D., Paccanaro, A., Fu, Y., Weissman, S., Weng, Z., Chang, J., Snyder, M., Gerstein, M.B.: Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 17, 787–797 (2007)
Article Google Scholar
Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Choo, S.W., Lu, Y., Denoeud, F., Antonarakis, S.E., Snyder, M., Ruan, Y., Wei, C.L., Gingeras, T.R., Guigo, R., Harrow, J., Gerstein, M.B.: Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 17, 839–851 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Yale University, New Haven, CT, 06520, USA
Mark Gerstein

Authors

Mark Gerstein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Biology, Department of Biomedical Engineering and College of Computing, Georgia Institute of Technology, Atlanta, Georgia, USA
Mark Borodovsky
Molecular and Cell Biology Department, University of Connecticut, 91 North Eagleville Road, Unit 3125, 06269-3125, Storrs, CT, USA
Johann Peter Gogarten
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Building 38A 8S814, 20894, Bethesda, MD, USA
Teresa M. Przytycka
Department of Computer Science and Engineering, University of Connecticut, 257 ITE Building, 371 Fairfield Way, 06269-2155, Storrs, CT, USA
Sanguthevar Rajasekaran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gerstein, M. (2010). Human Genome Annotation. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds) Bioinformatics Research and Applications. ISBRA 2010. Lecture Notes in Computer Science(), vol 6053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13078-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-13078-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13077-9
Online ISBN: 978-3-642-13078-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics