Abstract
A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balasubramanian, S., Zheng, D., Liu, Y.J., Fang, G., Frankish, A., Carriero, N., Robilotto, R., Cayting, P., Gerstein, M.: Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biol. 10, R2 (2009)
Du, J., Bjornson, R.D., Zhang, Z.D., Kong, Y., Snyder, M., Gerstein, M.B.: Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Comput. Biol. 5, e1000432 (2009)
Kim, P.M., Lam, H.Y., Urban, A.E., Korbel, J.O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., Gerstein, M.B.: Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 18, 1865–1874 (2008)
Korbel, J.O., Abyzov, A., Mu, X.J., Carriero, N., Cayting, P., Zhang, Z., Snyder, M., Gerstein, M.B.: PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, 23 (2009)
Lam, H.Y., Khurana, E., Fang, G., Cayting, P., Carriero, N., Cheung, K.H., Gerstein, M.B.: Pseudofam: the pseudogene families database. Nucleic Acids Res. 37, D738–D743 (2009)
Lam, H.Y., Mu, X.J., Stütz, A.M., Tanzer, A., Cayting, P.D., Snyder, M., Kim, P.M., Korbel, J.O., Gerstein, M.B.: Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010)
Rozowsky, J., Euskirchen, G., Auerbach, R.K., Zhang, Z.D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M.B.: PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009)
Snyder, M., Weissman, S., Gerstein, M.: Personal phenotypes to go with personal genomes. Mol. Syst. Biol. 5, 273 (2009)
Wang, L.Y., Abyzov, A., Korbel, J.O., Snyder, M., Gerstein, M.: MSB: A mean-shift-based approach for the analysis of structural variation in the genome. Genome Res. 19, 106–117 (2009)
Zhang, Z.D., Paccanaro, A., Fu, Y., Weissman, S., Weng, Z., Chang, J., Snyder, M., Gerstein, M.B.: Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 17, 787–797 (2007)
Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Choo, S.W., Lu, Y., Denoeud, F., Antonarakis, S.E., Snyder, M., Ruan, Y., Wei, C.L., Gingeras, T.R., Guigo, R., Harrow, J., Gerstein, M.B.: Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 17, 839–851 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gerstein, M. (2010). Human Genome Annotation. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds) Bioinformatics Research and Applications. ISBRA 2010. Lecture Notes in Computer Science(), vol 6053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13078-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-13078-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13077-9
Online ISBN: 978-3-642-13078-6
eBook Packages: Computer ScienceComputer Science (R0)