skip to main content
survey

DNA Sequencing Technologies: Sequencing Data Protocols and Bioinformatics Tools

Published:13 September 2019Publication History
Skip Abstract Section

Abstract

The recent advances in DNA sequencing technology, from first-generation sequencing (FGS) to third-generation sequencing (TGS), have constantly transformed the genome research landscape. Its data throughput is unprecedented and severalfold as compared with past technologies. DNA sequencing technologies generate sequencing data that are big, sparse, and heterogeneous. This results in the rapid development of various data protocols and bioinformatics tools for handling sequencing data.

In this review, a historical snapshot of DNA sequencing is taken with an emphasis on data manipulation and tools. The technological history of DNA sequencing is described and reviewed in thorough detail. To manipulate the sequencing data generated, different data protocols are introduced and reviewed. In particular, data compression methods are highlighted and discussed to provide readers a practical perspective in the real-world setting. A large variety of bioinformatics tools are also reviewed to help readers extract the most from their sequencing data in different aspects, such as sequencing quality control, genomic visualization, single-nucleotide variant calling, INDEL calling, structural variation calling, and integrative analysis. Toward the end of the article, we critically discuss the existing DNA sequencing technologies for their pitfalls and potential solutions.

Skip Supplemental Material Section

Supplemental Material

References

  1. A. Abyzov, A. E. Urban, M. Snyder, and M. Gerstein. 2011. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Research 21, 6, 974--984.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. A. Albers, G. Lunter, D. G. MacArthur, G. McVean, W. H. Ouwehand, and R. Durbin. 2011. Dindel: Accurate INDEL calls from short-read data. Genome Research 21, 6, 961--973.Google ScholarGoogle ScholarCross RefCross Ref
  3. Susan Aldridge, Brady Huggett, K. S. Jayaraman, Lisa Melton, Mark Ratner, and Nayanah Siva. 2008. 1000 Genomes project. Nature Biotechnology 26, 3, 256--256.Google ScholarGoogle ScholarCross RefCross Ref
  4. Can Alkan, Jeffrey M. Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O. Kitzman, Carl Baker, Maika Malig, Onur Mutlu, S. Cenk Sahinalp, Richard A. Gibbs, and Evan E. Eichler. 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics 41, 10, 1061--1067.Google ScholarGoogle ScholarCross RefCross Ref
  5. Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3, 403--410.Google ScholarGoogle ScholarCross RefCross Ref
  6. Riyue Bao, Lei Huang, Jorge Andrade, Wei Tan, Warren A. Kibbe, Hongmei Jiang, and Gang Feng. 2014. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Informatics 13s2 (2014), 67--83.Google ScholarGoogle Scholar
  7. Robert W. Bauman. 2013. Microbiology with Diseases by Taxonomy. Pearson Higher Ed.Google ScholarGoogle Scholar
  8. S. Bennett. 2004. Solexa Ltd. Pharmacogenomics 5, 4 (2014), 433--438.Google ScholarGoogle ScholarCross RefCross Ref
  9. James K. Bonfield. 2014. The Scramble conversion tool. Bioinformatics 30, 19 (Oct. 2014), 2818--2819.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jayson Bowers, Judith Mitchell, Eric Beer, Philip R. Buzby, Marie Causey, J. William Efcavitch, Mirna Jarosz, Edyta Krzymanska-Olejnik, Li Kung, Doron Lipson, et al. 2009. Virtual terminator nucleotides for next-generation DNA sequencing. Nature Methods 6, 8, 593--595.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ido Braslavsky, Benedict Hebert, Emil Kartalov, and Stephen R. Quake. 2003. Sequence information can be obtained from single DNA molecules. Proceedings of the National Academy of Sciences 100, 7, 3960--3964.Google ScholarGoogle ScholarCross RefCross Ref
  12. William Brockman, Pablo Alvarez, Sarah Young, Manuel Garber, Georgia Giannoukos, William L. Lee, Carsten Russ, Eric S. Lander, Chad Nusbaum, and David B. Jaffe. 2008. Quality scores and SNP detection in sequencing-by-synthesis systems.Genome Research 18, 5, 763--70.Google ScholarGoogle Scholar
  13. Yana Bromberg and Burkhard Rost. 2007. SNAP: Predict effect of non-synonymous polymorphisms on function. Nucleic Acids Research 35, 11, 3823--3835.Google ScholarGoogle ScholarCross RefCross Ref
  14. Tim Carver, Simon R. Harris, Thomas D. Otto, Matthew Berriman, Julian Parkhill, and Jacqueline A. McQuillan. 2013. BamView: Visualizing and interpretation of next-generation sequencing read alignments. Briefings in Bioinformatics 14, 2, 203--212.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ken Chen, John W. Wallis, Michael D. McLellan, David E. Larson, Joelle M. Kalicki, Craig S. Pohl, Sean D. McGrath, Michael C. Wendl, Qunyuan Zhang, Devin P. Locke, Xiaoqi Shi, Robert S. Fulton, Timothy J. Ley, Richard K. Wilson, Li Ding, and Elaine R. Mardis. 2009. BreakDancer: An algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 9, 677--681.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Y. Cheng, Y.-Y. Teo, and R. T.-H. Ong. 2014. Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals. Bioinformatics 30, 12, 1707--1713.Google ScholarGoogle ScholarCross RefCross Ref
  17. Bastien Chevreux, Thomas Pfisterer, Bernd Drescher, Albert J. Driesel, Werner E. G. Müller, Thomas Wetter, and Sándor Suhai. 2004. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research 14, 6, 1147--59.Google ScholarGoogle ScholarCross RefCross Ref
  18. Chen-Shan Chin, Jon Sorenson, Jason B. Harris, William P. Robins, Richelle C. Charles, Roger R. Jean-Charles, James Bullard, Dale R. Webster, Andrew Kasarskis, Paul Peluso, et al. 2011. The origin of the Haitian cholera outbreak strain. New England Journal of Medicine 364, 1, 33--42.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. H. Chung, W. Y. Tsai, C. Y. Kang, P. J. Yao, H. J. Tsai, and C. H. Chen. 2016. FamPipe: An automatic analysis pipeline for analyzing sequencing data in families for disease studies. PLoS Comput. Biol. 12, 6, e1004980.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kristian Cibulskis, Michael S. Lawrence, Scott L. Carter, Andrey Sivachenko, David Jaffe, Carrie Sougnez, Stacey Gabriel, Matthew Meyerson, Eric S. Lander, and Gad Getz. 2013. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnology 31, 3, 213--219.Google ScholarGoogle ScholarCross RefCross Ref
  21. James Clarke, Hai-Chen Wu, Lakmal Jayasinghe, Alpesh Patel, Stuart Reid, and Hagan Bayley. 2009. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology 4, 4, 265--270.Google ScholarGoogle ScholarCross RefCross Ref
  22. Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, and Peter M. Rice. 2010. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 38, 6, 1767--1771.Google ScholarGoogle ScholarCross RefCross Ref
  23. ENCODE Project Consortium et al. 2004. The ENCODE (ENCyclopedia of DNA elements) project. Science 306, 5696, 636--640.Google ScholarGoogle Scholar
  24. David Cyranoski. 2016. China’s bid to be a DNA superpower.Nature 534, 7608, 462--463.Google ScholarGoogle Scholar
  25. Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, et al. 2011. The variant call format and VCFtools. Bioinformatics 27, 15, 2156--2158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Matei David, Lewis Jonathan Dursi, Delia Yao, Paul C. Boutros, and Jared T. Simpson. 2016. Nanocall: An open source basecaller for Oxford Nanopore sequencing data. Bioinformatics 33, 1 (2016), 49--55.Google ScholarGoogle ScholarCross RefCross Ref
  27. Cees Dekker. 2007. Solid-state nanopores. Nature Nanotechnology 2, 4, 209--215.Google ScholarGoogle ScholarCross RefCross Ref
  28. Mark A. DePristo, Eric Banks, Ryan Poplin, Kiran V. Garimella, Jared R. Maguire, Christopher Hartl, Anthony A. Philippakis, Guillermo del Angel, Manuel A. Rivas, Matt Hanna, Aaron McKenna, Tim J. Fennell, Andrew M. Kernytsky, Andrey Y. Sivachenko, Kristian Cibulskis, Stacey B. Gabriel, David Altshuler, and Mark J. Daly. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 5, 491--498.Google ScholarGoogle ScholarCross RefCross Ref
  29. John Eid, Adrian Fehr, Jeremy Gray, Khai Luong, John Lyle, Geoff Otto, Paul Peluso, David Rank, Primo Baybayan, Brad Bettman, et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323, 5910, 133--138.Google ScholarGoogle Scholar
  30. Michael Eisenstein. 2012. The battle for sequencing supremacy. Nature Biotechnology 30, 11, 1023.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Ekblom, L. Smeds, and H. Ellegren. 2014. Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria. BMC Genomics 15, 1 (2014), 467.Google ScholarGoogle ScholarCross RefCross Ref
  32. Brent Ewing and Phil Green. 1998. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Research 8, 3, 186--194.Google ScholarGoogle ScholarCross RefCross Ref
  33. Brent Ewing, LaDeana Hillier, Michael C. Wendl, and Phil Green. 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Research 8, 3, 175--185.Google ScholarGoogle ScholarCross RefCross Ref
  34. Gregory G. Faust and Ira M. Hall. 2014. SAMBLASTER: Fast duplicate marking and structural variant read extraction.Bioinformatics (Oxford, England) 30, 17, 2503--5.Google ScholarGoogle Scholar
  35. Y. Fei. 2014. DNA sequencing, Sanger and next-generation sequencing. Applications of Molecular Genetics in Personalized Medicine. USA: OMICS Group eBooks.Google ScholarGoogle Scholar
  36. Nowlan H. Freese, David C. Norris, and Ann E. Loraine. 2016. Integrated genome browser: Visual analytics platform for genomics. Bioinformatics 32, 14, 2089--2095.Google ScholarGoogle ScholarCross RefCross Ref
  37. Huanying Ge, Kejun Liu, Todd Juan, Fang Fang, Matthew Newman, and Wolfgang Hoeck. 2011. FusionMap: Detecting fusion genes from next-generation sequencing data at base-pair resolution.Bioinformatics (Oxford, England) 27, 14, 1922--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lewis Y. Geer, Aron Marchler-Bauer, Renata C. Geer, Lianyi Han, Jane He, Siqian He, Chunlei Liu, Wenyao Shi, and Stephen H. Bryant. 2009. The NCBI biosystems database. Nucleic Acids Research 38, suppl_1 (2009), D492--D496.Google ScholarGoogle Scholar
  39. André Gilles, Emese Meglécz, Nicolas Pech, Stéphanie Ferreira, Thibaut Malausa, and Jean-François Martin. 2011. Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12, 1, 245.Google ScholarGoogle ScholarCross RefCross Ref
  40. Sante Gnerre, Iain MacCallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance P. Shea, Sean Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Chad Nusbaum, Eric S. Lander, and David B. Jaffe. 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108, 4, 1513--1518.Google ScholarGoogle ScholarCross RefCross Ref
  41. Sara Goodwin, John D. McPherson, and W. Richard McCombie. 2016. Coming of age: Ten years of next-generation sequencing technologies. Nature Reviews Genetics 17, 6, 333--351.Google ScholarGoogle ScholarCross RefCross Ref
  42. Anthony J. F. Griffiths, Jeffrey H. Miller, David T. Suzuki, Richard C. Lewontin, and William M. Gelbart. 2000. Somatic versus germinal mutation. In An Introduction to Genetic Analysis (7th ed.). W. H. Freeman.Google ScholarGoogle Scholar
  43. SAM/BAM Format Specification Working Group et al. 2013. Sequence alignment/map format specification. Retrieved August 3, 2019 from https://github.com/samtools/hts-specs.Google ScholarGoogle Scholar
  44. Y. Guo, X. Ding, Y. Shen, G. J. Lyon, and K. Wang. 2015. SeqMule: Automated pipeline for analysis of human exome/genome sequencing data. Sci Rep 5 (2015), 14283.Google ScholarGoogle ScholarCross RefCross Ref
  45. Ivo Glynne Gut. 2013. New sequencing technologies. Clinical and Translational Oncology 15, 11, 879--881.Google ScholarGoogle ScholarCross RefCross Ref
  46. G. Ha, A. Roth, D. Lai, A. Bashashati, J. Ding, R. Goya, R. Giuliany, J. Rosner, A. Oloumi, K. Shumansky, S.-F. Chin, G. Turashvili, M. Hirst, C. Caldas, M. A. Marra, S. Aparicio, and S. P. Shah. 2012. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Research 22, 10, 1995--2007.Google ScholarGoogle ScholarCross RefCross Ref
  47. Thomas Hackl, Rainer Hedrich, Jörg Schultz, and Frank Förster. 2014. Proovread: Large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics 30, 21, 3004--3011.Google ScholarGoogle ScholarCross RefCross Ref
  48. Timothy D. Harris, Phillip R. Buzby, Hazen Babcock, Eric Beer, Jayson Bowers, Ido Braslavsky, Marie Causey, Jennifer Colonell, James DiMeo, J. William Efcavitch, et al. 2008. Single-molecule DNA sequencing of a viral genome. Science 320, 5872, 106--109.Google ScholarGoogle Scholar
  49. V. J. Henry, A. E. Bandrowski, A. S. Pepin, B. J. Gonzalez, and A. Desfeux. 2014. OMICtools: An informative directory for multi-omic data analysis. Database 2014, Article bau069 (2014).Google ScholarGoogle Scholar
  50. Eran Hodis, Ian R. Watson, Gregory V. Kryukov, Stefan T. Arold, Marcin Imielinski, Jean-Philippe Theurillat, Elizabeth Nickerson, Daniel Auclair, Liren Li, Chelsea Place, Daniel DiCara, Alex H. Ramos, Michael S. Lawrence, Kristian Cibulskis, Andrey Sivachenko, Douglas Voet, Gordon Saksena, Nicolas Stransky, Robert C. Onofrio, Wendy Winckler, Kristin Ardlie, Nikhil Wagle, Jennifer Wargo, Kelly Chong, Donald L. Morton, Katherine Stemke-Hale, Guo Chen, Michael Noble, Matthew Meyerson, John E. Ladbury, Michael A. Davies, Jeffrey E. Gershenwald, Stephan N. Wagner, Dave S. B. Hoon, Dirk Schadendorf, Eric S. Lander, Stacey B. Gabriel, Gad Getz, Levi A. Garraway, and Lynda Chin. 2012. A landscape of driver mutations in melanoma. Cell 150, 2, 251--263.Google ScholarGoogle ScholarCross RefCross Ref
  51. Mark Hollmer. 2013. Roche to close 454 Life Sciences as it reduces gene sequencing focus. Retrieved August 3, 2019 from http://www.fiercebiotech.com/medical-devices/roche-to-close-454-life-sciences-as-it-reduces-gene-sequencing-focus.Google ScholarGoogle Scholar
  52. Inc Illumina. 2008. Sequencing analysis software user guide for pipeline version 1.3 and CASAVA version 1.0 Illumina Inc. San Diego, CA.Google ScholarGoogle Scholar
  53. Zamin Iqbal, Mario Caccamo, Isaac Turner, Paul Flicek, and Gil McVean. 2012. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics 44, 2, 226--232.Google ScholarGoogle ScholarCross RefCross Ref
  54. Miten Jain, Ian T. Fiddes, Karen H. Miga, Hugh E. Olsen, Benedict Paten, and Mark Akeson. 2015. Improved data analysis for the MinION Nanopore sequencer. Nature Methods 12, 4, 351--356.Google ScholarGoogle ScholarCross RefCross Ref
  55. Miten Jain, Hugh E. Olsen, Benedict Paten, and Mark Akeson. 2016. The Oxford Nanopore MinION: Delivery of Nanopore sequencing to the genomics community. Genome Biology 17, 1, 239.Google ScholarGoogle Scholar
  56. Scott D. Kahn. 2011. On the future of genomic data. Science 331, 6018 (2011), 728--729.Google ScholarGoogle ScholarCross RefCross Ref
  57. John J. Kasianowicz, Eric Brandin, Daniel Branton, and David W. Deamer. 1996. Characterization of individual polynucleotide molecules using a membrane channel. Proceedings of the National Academy of Sciences 93, 24, 13770--13773.Google ScholarGoogle ScholarCross RefCross Ref
  58. W. James Kent. 2002. BLAT — the BLAST-like alignment tool. Genome Research 12, 4, 656--664.Google ScholarGoogle ScholarCross RefCross Ref
  59. Daniel C. Koboldt, David E. Larson, Richard K. Wilson, Daniel C. Koboldt, David E. Larson, and Richard K. Wilson. 2013. Using VarScan 2 for germline variant calling and somatic mutation detection. In Current Protocols in Bioinformatics. John Wiley and Sons, Inc., Hoboken, NJ, 15.4.1--15.4.17.Google ScholarGoogle Scholar
  60. Daniel C. Koboldt, Karyn Meltz Steinberg, David E. Larson, Richard K. Wilson, and Elaine R. Mardis. 2013. The next-generation sequencing revolution and its impact on genomics. Cell 155, 1, 27--38.Google ScholarGoogle ScholarCross RefCross Ref
  61. Jan O. Korbel, Alexej Abyzov, Xinmeng Mu, Nicholas Carriero, Philip Cayting, Zhengdong Zhang, Michael Snyder, Mark B. Gerstein, E. Pennisi, L. Feuk, A. R. Carson, S. W. Scherer, R. Redon, S. Ishikawa, K. R. Fitch, L. Feuk, T. Borodina, H. Himmelbauer, E. S. Lander, M. S. Waterman, S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 2009. PEMer: A computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biology 10, 2, R23.Google ScholarGoogle ScholarCross RefCross Ref
  62. Sergey Koren, Michael C. Schatz, Brian P. Walenz, Jeffrey Martin, Jason T. Howard, Ganeshkumar Ganapathy, Zhong Wang, David A. Rasko, W. Richard McCombie, Erich D. Jarvis, et al. 2012. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology 30, 7, 693--700.Google ScholarGoogle ScholarCross RefCross Ref
  63. Hugo Y. K. Lam, Xinmeng Jasmine Mu, Adrian M. Stütz, Andrea Tanzer, Philip D. Cayting, Michael Snyder, Philip M. Kim, Jan O. Korbel, and Mark B. Gerstein. 2010. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotechnology 28, 47--55.Google ScholarGoogle ScholarCross RefCross Ref
  64. Ben Langmead, Cole Trapnell, Mihai Pop, Steven L. Salzberg, et al. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, 3, R25.Google ScholarGoogle ScholarCross RefCross Ref
  65. Ben Langmead, Cole Trapnell, Mihai Pop, Steven L. Salzberg, T. A. Down, V. K. Rakyan, D. J. Turner, P. Flicek, H. Li, E. Kulesha, S. Graf, N. Johnson, J. Herrero, E. M. Tomazou, N. P. Thorne, L. Backdahl, M. Herberth, K. L. Howe, D. K. Jackson, M. M. Miretti, J. C. Marioni, E. Birney, T. J. Hubbard, R. Durbin, S. Tavare, S. Beck, D. S. Johnson, A. Mortazavi, R. M. Myers, D. Weese, T. Rausch, and K. Reinert. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, 3, R25.Google ScholarGoogle ScholarCross RefCross Ref
  66. Ilkka Lappalainen, Jeff Almeida-King, Vasudev Kumanduri, Alexander Senf, John Dylan Spalding, Gary Saunders, Jag Kandasamy, Mario Caccamo, Rasko Leinonen, Brendan Vaughan, et al. 2015. The European Genome-phenome archive of human data consented for biomedical research. Nature Genetics 47, 7, 692--695.Google ScholarGoogle ScholarCross RefCross Ref
  67. David E. Larson, Christopher C. Harris, Ken Chen, Daniel C. Koboldt, Travis E. Abbott, David J. Dooling, Timothy J. Ley, Elaine R. Mardis, Richard K. Wilson, and Li Ding. 2012. SomaticSniper: Identification of somatic point mutations in whole genome sequencing data.Bioinformatics 28, 3, 311--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Michael S. Lawrence, Petar Stojanov, Paz Polak, Gregory V. Kryukov, Kristian Cibulskis, Andrey Sivachenko, Scott L. Carter, Chip Stewart, Craig H. Mermel, Steven A. Roberts, Adam Kiezun, Peter S. Hammerman, Aaron McKenna, Yotam Drier, Lihua Zou, Alex H. Ramos, Trevor J. Pugh, Nicolas Stransky, Elena Helman, Jaegil Kim, Carrie Sougnez, Lauren Ambrogio, Elizabeth Nickerson, Erica Shefler, Maria L. Cortés, Daniel Auclair, Gordon Saksena, Douglas Voet, Michael Noble, Daniel DiCara, Pei Lin, Lee Lichtenstein, David I. Heiman, Timothy Fennell, Marcin Imielinski, Bryan Hernandez, Eran Hodis, Sylvan Baca, Austin M. Dulak, Jens Lohr, Dan-Avi Landau, Catherine J. Wu, Jorge Melendez-Zajgla, Alfredo Hidalgo-Miranda, Amnon Koren, Steven A. McCarroll, Jaume Mora, Ryan S. Lee, Brian Crompton, Robert Onofrio, Melissa Parkin, Wendy Winckler, Kristin Ardlie, Stacey B. Gabriel, Charles W. M. Roberts, Jaclyn A. Biegel, Kimberly Stegmaier, Adam J. Bass, Levi A. Garraway, Matthew Meyerson, Todd R. Golub, Dmitry A. Gordenin, Shamil Sunyaev, Eric S. Lander, and Gad Getz. 2013. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 7457, 214--218.Google ScholarGoogle Scholar
  69. Seunghak Lee, Fereydoun Hormozdiari, Can Alkan, and Michael Brudno. 2009. MoDIL: Detecting small INDELs from clone-end sequencing with mixtures of distributions. Nature Methods 6, 7, 473--474.Google ScholarGoogle ScholarCross RefCross Ref
  70. R. Leinonen, H. Sugawara, and M. Shumway. 2010. The sequence read archive. Nucleic Acids Research 39, Database, D19--D21.Google ScholarGoogle Scholar
  71. R. Leinonen, H. Sugawara, and M. Shumway. 2011. The sequence read archive. Nucleic Acids Research 39, Database, D19--D21.Google ScholarGoogle Scholar
  72. Michael J. Levene, Jonas Korlach, Stephen W. Turner, Mathieu Foquet, Harold G. Craighead, and Watt W. Webb. 2003. Zero-mode waveguides for single-molecule analysis at high concentrations. Science 299, 5607, 682--686.Google ScholarGoogle Scholar
  73. Heng Li. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 21, 2987--2993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows--Wheeler transform. Bioinformatics 25, 14, 1754--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform.Bioinformatics (Oxford, England) 25, 14, 1754--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 16, 2078--2079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools.Bioinformatics (Oxford, England) 25, 16, 2078--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Heng Li, Jue Ruan, and Richard Durbin. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18, 11, 1851--1858.Google ScholarGoogle ScholarCross RefCross Ref
  79. Heng Li, Jue Ruan, and Richard Durbin. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores.Genome Research 18, 11, 1851--8.Google ScholarGoogle Scholar
  80. Jian Li, Aarif Mohamed Nazeer Batcha, Björn Grüning, and Ulrich R. Mansmann. 2015. An NGS workflow blueprint for DNA sequencing data and its application in individualized molecular oncology. Cancer Informatics 14, Suppl 5, 87.Google ScholarGoogle Scholar
  81. Jiali Li, Derek Stein, Ciaran McMullan, Daniel Branton, Michael J. Aziz, and Jene A. Golovchenko. 2001. Ion-beam sculpting at nanometre length scales. Nature 412, 6843, 166--169.Google ScholarGoogle Scholar
  82. M. Li, Magnus Nordborg, and Lei M. Li. 2004. Adjust quality scores from alignment and improve sequencing accuracy. Nucleic Acids Research 32, 17, 5183--5191.Google ScholarGoogle ScholarCross RefCross Ref
  83. Ruiqiang Li, Wei Fan, Geng Tian, Hongmei Zhu, Lin He, Jing Cai, Quanfei Huang, Qingle Cai, Bo Li, Yinqi Bai, Zhihe Zhang, Yaping Zhang, Wen Wang, Jun Li, Fuwen Wei, Heng Li, Min Jian, Jianwen Li, Zhaolei Zhang, Rasmus Nielsen, Dawei Li, Wanjun Gu, Zhentao Yang, Zhaoling Xuan, Oliver A. Ryder, Frederick Chi-Ching Leung, Yan Zhou, Jianjun Cao, Xiao Sun, Yonggui Fu, Xiaodong Fang, Xiaosen Guo, Bo Wang, Rong Hou, Fujun Shen, Bo Mu, Peixiang Ni, Runmao Lin, Wubin Qian, Guodong Wang, Chang Yu, Wenhui Nie, Jinhuan Wang, Zhigang Wu, Huiqing Liang, Jiumeng Min, Qi Wu, Shifeng Cheng, Jue Ruan, Mingwei Wang, Zhongbin Shi, Ming Wen, Binghang Liu, Xiaoli Ren, Huisong Zheng, Dong Dong, Kathleen Cook, Gao Shan, Hao Zhang, Carolin Kosiol, Xueying Xie, Zuhong Lu, Hancheng Zheng, Yingrui Li, Cynthia C. Steiner, Tommy Tsan-Yuk Lam, Siyuan Lin, Qinghui Zhang, Guoqing Li, Jing Tian, Timing Gong, Hongde Liu, Dejin Zhang, Lin Fang, Chen Ye, Juanbin Zhang, Wenbo Hu, Anlong Xu, Yuanyuan Ren, Guojie Zhang, Michael W. Bruford, Qibin Li, Lijia Ma, Yiran Guo, Na An, Yujie Hu, Yang Zheng, Yongyong Shi, Zhiqiang Li, Qing Liu, Yanling Chen, Jing Zhao, Ning Qu, Shancen Zhao, Feng Tian, Xiaoling Wang, Haiyin Wang, Lizhi Xu, Xiao Liu, Tomas Vinar, Yajun Wang, Tak-Wah Lam, Siu-Ming Yiu, Shiping Liu, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang, Guohua Yang, Zhi Jiang, Junyi Wang, Nan Qin, Li Li, Jingxiang Li, Lars Bolund, Karsten Kristiansen, Gane Ka-Shu Wong, Maynard Olson, Xiuqing Zhang, Songgang Li, Huanming Yang, Jian Wang, and Jun Wang. 2010. The sequence and de novo assembly of the giant panda genome. Nature 463, 7279, 311--317.Google ScholarGoogle Scholar
  84. R. Li, Y. Li, X. Fang, H. Yang, J. Wang, K. Kristiansen, and J. Wang. 2009. SNP detection for massively parallel whole-genome resequencing. Genome Research 19, 6, 1124--1132.Google ScholarGoogle ScholarCross RefCross Ref
  85. Ruiqiang Li, Hongmei Zhu, Jue Ruan, Wubin Qian, Xiaodong Fang, Zhongbin Shi, Yingrui Li, Shengting Li, Gao Shan, Karsten Kristiansen, Songgang Li, Huanming Yang, Jian Wang, and Jun Wang. 2010. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20 (2010), 265--272.Google ScholarGoogle ScholarCross RefCross Ref
  86. Lin Liu, Yinhu Li, Siliang Li, Ni Hu, Yimin He, Ray Pong, Danni Lin, Lihua Lu, and Maggie Law. 2012. Comparison of next-generation sequencing systems. BioMed Research International 2012, Article 251364 (2012), 11 pages.Google ScholarGoogle Scholar
  87. Yongchao Liu, Bernt Popp, Bertil Schmidt, A. D. Smith, Z. Xuan, M. Q. Zhang, H. Li, J. Ruan, R. Durbin, N. Homer, B. Merriman, S. F. Nelson, B. Langmead, C. Trapnell, L. Li, J. R. Myers, G. T. Marth, B. Ewing, P. Green, A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, P. Ferragina, G. Manzini, T. F. Smith, and M. S. Waterman. 2014. CUSHAW3: Sensitive and accurate base-space and color-space short-read alignment with hybrid seeding. PLoS ONE 9, 1, e86869.Google ScholarGoogle ScholarCross RefCross Ref
  88. Po-Ru Loh, Michael Baym, and Bonnie Berger. 2012. Compressive genomics. Nature Biotechnology 30, 7, 627--630.Google ScholarGoogle ScholarCross RefCross Ref
  89. Nicholas J. Loman and Aaron R. Quinlan. 2014. Poretools: A toolkit for analyzing Nanopore sequence data. Bioinformatics 30, 23, 3399--3401.Google ScholarGoogle ScholarCross RefCross Ref
  90. G. Lunter and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Research 21, 6, 936--939.Google ScholarGoogle ScholarCross RefCross Ref
  91. P. L. Luu, D. Gerovska, M. Arrospide-Elgarresta, S. Retegi-Carrion, H. R. Scholer, and M. J. Arauzo-Bravo. 2017. P3BSseq: Parallel processing pipeline software for automatic analysis of bisulfite sequencing data. Bioinformatics 33, 3, 428--431.Google ScholarGoogle Scholar
  92. Elaine R. Mardis. 2008. The impact of next-generation sequencing technology on genetics. Trends in Genetics 24, 3, 133--141.Google ScholarGoogle ScholarCross RefCross Ref
  93. Elaine R. Mardis. 2011. A decade’s perspective on DNA sequencing technology. Nature 470, 7333, 198--203.Google ScholarGoogle Scholar
  94. Elaine R. Mardis. 2013. Next-generation sequencing platforms. Annual Review of Analytical Chemistry (Palo Alto Calif) 6 (2013), 287--303.Google ScholarGoogle ScholarCross RefCross Ref
  95. Marcel Margulies, Michael Egholm, William E. Altman, Said Attiya, Joel S. Bader, Lisa A. Bemben, Jan Berka, Michael S. Braverman, Yi-Ju Chen, Zhoutao Chen, et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 7057, 376--380.Google ScholarGoogle Scholar
  96. Allan M. Maxam and Walter Gilbert. 1977. A new method for sequencing DNA. Proceedings of the National Academy of Sciences 74, 2, 560--564.Google ScholarGoogle ScholarCross RefCross Ref
  97. A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, and M. A. DePristo. 2010. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 9, 1297--1303.Google ScholarGoogle ScholarCross RefCross Ref
  98. Alexander Mellmann, Dag Harmsen, Craig A. Cummings, Emily B. Zentz, Shana R. Leopold, Alain Rico, Karola Prior, Rafael Szczepanowski, Yongmei Ji, Wenlan Zhang, et al. 2011. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104: H4 outbreak by rapid next generation sequencing technology. PLoS One 6, 7, e22751.Google ScholarGoogle ScholarCross RefCross Ref
  99. BIG Data Center Members. 2017. The BIG data center: From deposition to integration to translation. Nucleic Acids Research 45, Database issue, D18.Google ScholarGoogle Scholar
  100. C. A. Meyer and X. S. Liu. 2014. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat. Rev. Genet. 15, 11, 709--721.Google ScholarGoogle ScholarCross RefCross Ref
  101. Huaiyu Mi, Sagar Poudel, Anushya Muruganujan, John T. Casagrande, and Paul D. Thomas. 2016. PANTHER version 10: Expanded protein families and functions, and analysis tools. Nucleic Acids Research 44, D1, D336--D342.Google ScholarGoogle ScholarCross RefCross Ref
  102. Jason R. Miller, Sergey Koren, and Granger Sutton. 2010. Assembly algorithms for next-generation sequencing data. Genomics 95, 6, 315--327.Google ScholarGoogle ScholarCross RefCross Ref
  103. Iain Milne, Gordon Stephen, Micha Bayer, Peter J. A. Cock, Leighton Pritchard, Linda Cardle, Paul D. Shaw, and David Marshall. 2013. Using Tablet for visual exploration of second-generation sequencing data.Briefings in Bioinformatics 14, 2, 193--202.Google ScholarGoogle Scholar
  104. S. B. Montgomery, D. L. Goode, E. Kvikstad, C. A. Albers, Z. D. Zhang, X. J. Mu, G. Ananda, B. Howie, K. J. Karczewski, K. S. Smith, V. Anaya, R. Richardson, J. Davis, D. G. MacArthur, A. Sidow, L. Duret, M. Gerstein, K. D. Makova, J. Marchini, G. McVean, G. Lunter, and Gerton Lunter. 2013. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Research 23, 5, 749--761.Google ScholarGoogle ScholarCross RefCross Ref
  105. Elizabeth P. Murchison, Ole B. Schulz-Trieglaff, Zemin Ning, Ludmil B. Alexandrov, Markus J. Bauer, Beiyuan Fu, Matthew Hims, Zhihao Ding, Sergii Ivakhno, Caitlin Stewart, Bee Ling Ng, Wendy Wong, Bronwen Aken, Simon White, Amber Alsop, Jennifer Becq, Graham R. Bignell, R. Keira Cheetham, William Cheng, Thomas R. Connor, Anthony J. Cox, Zhi-Ping Feng, Yong Gu, Russell J. Grocock, Simon R. Harris, Irina Khrebtukova, Zoya Kingsbury, Mark Kowarsky, Alexandre Kreiss, Shujun Luo, John Marshall, David J. McBride, Lisa Murray, Anne-Maree Pearse, Keiran Raine, Isabelle Rasolonjatovo, Richard Shaw, Philip Tedder, Carolyn Tregidgo, Albert J. Vilella, David C. Wedge, Gregory M. Woods, Niall Gormley, Sean Humphray, Gary Schroth, Geoffrey Smith, Kevin Hall, Stephen M. J. Searle, Nigel P. Carter, Anthony T. Papenfuss, P. Andrew Futreal, Peter J. Campbell, Fengtang Yang, David R. Bentley, Dirk J. Evers, and Michael R. Stratton. 2012. Genome sequencing and analysis of the Tasmanian Devil and its transmissible cancer. Cell 148, 4, 780--791.Google ScholarGoogle ScholarCross RefCross Ref
  106. Joseph A. Neuman, Ofer Isakov, and Noam Shomron. 2013. Analysis of insertion-deletion from deep-sequencing data: Software evaluation for optimal detection.Briefings in Bioinformatics 14, 1, 46--55.Google ScholarGoogle Scholar
  107. Thomas P. Niedringhaus, Denitsa Milanova, Matthew B. Kerby, Michael P. Snyder, and Annelise E. Barron. 2011. Landscape of next-generation sequencing technologies. Analytical Chemistry 83, 12, 4327--4341.Google ScholarGoogle ScholarCross RefCross Ref
  108. Beifang Niu, Limin Fu, Shulei Sun, Weizhong Li, D. B. Rusch, A. L. Halpern, G. Sutton, K. B. Heidelberg, S. Williamson, S. Yooseph, D. Wu, J. A. Eisen, J. M. Hoffman, K. Remington, J. C. Venter, K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A. Eisen, D. Wu, I. Paulsen, K. E. Nelson, W. Nelson, S. G. Tringe, C. von Mering, A. Kobayashi, A. A. Salamov, K. Chen, H. W. Chang, M. Podar, J. M. Short, E. J. Mathur, J. C. Detter, S. R. Gill, M. Pop, R. T. Deboy, P. B. Eckburg, P. J. Turnbaugh, B. S. Samuel, J. I. Gordon, D. A. Relman, C. M. Fraser-Liggett, K. E. Nelson, G. W. Tyson, J. Chapman, P. Hugenholtz, E. E. Allen, R. J. Ram, P. M. Richardson, V. V. Solovyev, E. M. Rubin, D. S. Rokhsar, J. F. Banfield, E. A. Dinsdale, R. A. Edwards, D. Hall, F. Angly, M. Breitbart, J. M. Brulc, M. Furlan, C. Desnues, M. Haynes, L. Li, J. Frias-Lopez, Y. Shi, G. W. Tyson, M. L. Coleman, S. C. Schuster, S. W. Chisholm, E. F. Delong, P. J. Turnbaugh, M. Hamady, T. Yatsunenko, B. L. Cantarel, A. Duncan, R. E. Ley, M. L. Sogin, W. J. Jones, B. A. Roe, J. P. Affourtit, J. Shendure, H. Ji, V. Gomez-Alvarez, T. K. Teal, T. M. Schmidt, W. Li, L. Jaroszewski, A. Godzik, W. Li, L. Jaroszewski, A. Godzik, W. Li, A. Godzik, M. Margulies, M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. M. Huse, J. A. Huber, H. G. Morrison, M. L. Sogin, D. M. Welch, A. R. Quinlan, D. A. Stewart, M. P. Stromberg, G. T. Marth, Z. Zhang, S. Schwartz, L. Wagner, W. Miller, K. Mavromatis, N. Ivanova, K. Barry, H. Shapiro, E. Goltsman, A. C. McHardy, I. Rigoutsos, A. Salamov, F. Korzeniewski, M. Land, R. S. Poretsky, I. Hewson, S. Sun, A. E. Allen, J. P. Zehr, M. A. Moran, J. A. Gilbert, D. Field, Y. Huang, R. Edwards, W. Li, P. Gilna, I. Joint, J. D. Thompson, D. G. Higgins, and T. J. Gibson. 2010. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 11, 1, 187.Google ScholarGoogle ScholarCross RefCross Ref
  109. Jeongsu Oh, Byung Kwon Kim, Wan-Sup Cho, Soon Gyu Hong, and Kyung Mo Kim. 2012. PyroTrimmer: A software with GUI for pre-processing 454 amplicon sequences. Journal of Microbiology 50, 5, 766--769.Google ScholarGoogle ScholarCross RefCross Ref
  110. Yukiteru Ono, Kiyoshi Asai, and Michiaki Hamada. 2013. PBSIM: PacBio reads simulator toward accurate genome assembly. Bioinformatics 29, 1, 119--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Fatih Ozsolak, Philipp Kapranov, Sylvain Foissac, Sang Woo Kim, Elane Fishilevich, A. Paula Monaghan, Bino John, and Patrice M. Milos. 2010. Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143, 6, 1018--1029.Google ScholarGoogle ScholarCross RefCross Ref
  112. Fatih Ozsolak, Adam R. Platt, Dan R. Jones, Jeffrey G. Reifenberger, Lauryn E. Sass, Peter McInerney, John F. Thompson, Jayson Bowers, Mirna Jarosz, and Patrice M. Milos. 2009. Direct RNA sequencing. Nature 461, 7265, 814--818.Google ScholarGoogle Scholar
  113. Stephan Pabinger, Andreas Dander, Maria Fischer, Rene Snajder, Michael Sperk, Mirjana Efremova, Birgit Krabichler, Michael R. Speicher, Johannes Zschocke, and Zlatko Trajanoski. 2014. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics 15, 2, 256--278.Google ScholarGoogle ScholarCross RefCross Ref
  114. Swati Parekh, Christoph Ziegenhain, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2016. The impact of amplification on differential expression analyses by RNA-seq. Scientific Reports 6 (2016), 25533.Google ScholarGoogle ScholarCross RefCross Ref
  115. Ravi K. Patel, Mukesh Jain, E. R. Mardis, Z. Wang, M. Gerstein, M. Snyder, R. Garg, R. K. Patel, A. K. Tyagi, M. Jain, R. Garg, R. K. Patel, S. Jhanwar, P. Priya, A. Bhattacharjee, A. Martinez-Alcantara, E. Ballesteros, F. M. Rojas, H. Koshinsky, V. Y. Fofanov, D. Blankenberg, A. Gordon, G. V. Kuster, N. Coraor, J. Taylor, M. P. Cox, D. A. Peterson, P. J. Biggs, R. Schmieder, Y. Lim, F. Rohwer, R. Edwards, R. Schmieder, R. Edwards, P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer, P. M. Rice, M. Margulies, M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, T. Lassmann, Y. Hayashizaki, C. O. Daub, M. Morgan, S. Anders, M. Lawrence, P. Aboyoun, H. Pages, R. V. Pandey, V. Nolte, and C. Schlotterer. 2012. NGS QC toolkit: A toolkit for quality control of next generation sequencing data. PLoS ONE 7, 2, e30619.Google ScholarGoogle ScholarCross RefCross Ref
  116. William R. Pearson and David J. Lipman. 1988. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences 85, 8, 2444--2448.Google ScholarGoogle ScholarCross RefCross Ref
  117. Mihai Pop and Steven L. Salzberg. 2008. Bioinformatics challenges of new sequencing technology. Trends in Genetics 24, 3, 142--149.Google ScholarGoogle ScholarCross RefCross Ref
  118. J. Quick, N. J. Loman, S. Duraffour, J. T. Simpson, E. Severi, L. Cowley, J. A. Bore, R. Koundouno, G. Dudas, A. Mikhail, N. Ouedraogo, B. Afrough, A. Bah, J. H. Baum, B. Becker-Ziaja, J. P. Boettcher, M. Cabeza-Cabrerizo, A. Camino-Sanchez, L. L. Carter, J. Doerrbecker, T. Enkirch, I. Garcia-Dorival, N. Hetzelt, J. Hinzmann, T. Holm, L. E. Kafetzopoulou, M. Koropogui, A. Kosgey, E. Kuisma, C. H. Logue, A. Mazzarelli, S. Meisel, M. Mertens, J. Michel, D. Ngabo, K. Nitzsche, E. Pallasch, L. V. Patrono, J. Portmann, J. G. Repits, N. Y. Rickett, A. Sachse, K. Singethan, I. Vitoriano, R. L. Yemanaberhan, E. G. Zekeng, T. Racine, A. Bello, A. A. Sall, O. Faye, O. Faye, N. Magassouba, C. V. Williams, V. Amburgey, L. Winona, E. Davis, J. Gerlach, F. Washington, V. Monteil, M. Jourdain, M. Bererd, A. Camara, H. Somlare, A. Camara, M. Gerard, G. Bado, B. Baillet, D. Delaune, K. Y. Nebie, A. Diarra, Y. Savane, R. B. Pallawo, G. J. Gutierrez, N. Milhano, I. Roger, C. J. Williams, F. Yattara, K. Lewandowski, J. Taylor, P. Rachwal, D. J. Turner, G. Pollakis, J. A. Hiscox, D. A. Matthews, M. K. O’Shea, A. M. Johnston, D. Wilson, E. Hutley, E. Smit, A. Di Caro, R. Wolfel, K. Stoecker, E. Fleischmann, M. Gabriel, S. A. Weller, L. Koivogui, B. Diallo, S. Keita, A. Rambaut, P. Formenty, S. Gunther, and M. W. Carroll. 2016. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 7589, 228--232.Google ScholarGoogle Scholar
  119. A. R. Quinlan, R. A. Clark, S. Sokolova, M. L. Leibowitz, Y. Zhang, M. E. Hurles, J. C. Mell, and I. M. Hall. 2010. Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Research 20, 5, 623--635.Google ScholarGoogle ScholarCross RefCross Ref
  120. Richard Redon, Shumpei Ishikawa, Karen R. Fitch, Lars Feuk, George H. Perry, T. Daniel Andrews, Heike Fiegler, Michael H. Shapero, Andrew R. Carson, Wenwei Chen, Eun Kyung Cho, Stephanie Dallaire, Jennifer L. Freeman, Juan R. González, Mònica Gratacòs, Jing Huang, Dimitrios Kalaitzopoulos, Daisuke Komura, Jeffrey R. MacDonald, Christian R. Marshall, Rui Mei, Lyndal Montgomery, Kunihiro Nishimura, Kohji Okamura, Fan Shen, Martin J. Somerville, Joelle Tchinda, Armand Valsesia, Cara Woodwark, Fengtang Yang, Junjun Zhang, Tatiana Zerjal, Jane Zhang, Lluis Armengol, Donald F. Conrad, Xavier Estivill, Chris Tyler-Smith, Nigel P. Carter, Hiroyuki Aburatani, Charles Lee, Keith W. Jones, Stephen W. Scherer, and Matthew E. Hurles. 2006. Global variation in copy number in the human genome. Nature 444, 7118, 444--454.Google ScholarGoogle Scholar
  121. A. Rhoads and K. F. Au. 2015. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13, 5, 278--289.Google ScholarGoogle ScholarCross RefCross Ref
  122. Manuel A. Rivas, Mélissa Beaudoin, Agnes Gardet, Christine Stevens, Yashoda Sharma, Clarence K. Zhang, Gabrielle Boucher, Stephan Ripke, David Ellinghaus, Noel Burtt, Tim Fennell, Andrew Kirby, Anna Latiano, Philippe Goyette, Todd Green, Jonas Halfvarson, Talin Haritunians, Joshua M. Korn, Finny Kuruvilla, Caroline Lagacé, Benjamin Neale, Ken Sin Lo, Phil Schumm, Leif Törkvist, Marla C. Dubinsky, Steven R. Brant, Mark S. Silverberg, Richard H. Duerr, David Altshuler, Stacey Gabriel, Guillaume Lettre, Andre Franke, Mauro D’Amato, Dermot P. B. McGovern, Judy H. Cho, John D. Rioux, Ramnik J. Xavier, Mark J. Daly, John D. Rioux, Ramnik J. Xavier, and Mark J. Daly. 2011. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nature Genetics 43, 11, 1066--1073.Google ScholarGoogle ScholarCross RefCross Ref
  123. N. D. Roberts, R. D. Kortschak, W. T. Parker, A. W. Schreiber, S. Branford, H. S. Scott, G. Glonek, and D. L. Adelson. 2013. A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics 29, 18, 2223--2230.Google ScholarGoogle ScholarCross RefCross Ref
  124. Holger Rohde, Junjie Qin, Yujun Cui, Dongfang Li, Nicholas J. Loman, Moritz Hentschke, Wentong Chen, Fei Pu, Yangqing Peng, Junhua Li, et al. 2011. Open-source genomic analysis of Shiga-toxin--producing E. coli O104: H4. New England Journal of Medicine 365, 8, 718--724.Google ScholarGoogle ScholarCross RefCross Ref
  125. M. G. Ross, C. Russ, M. Costello, A. Hollinger, N. J. Lennon, R. Hegarty, C. Nusbaum, and D. B. Jaffe. 2013. Characterizing and measuring bias in sequence data. Genome Biol. 14, 5, R51.Google ScholarGoogle ScholarCross RefCross Ref
  126. M. Rubio-Camarillo, G. Gomez-Lopez, J. M. Fernandez, A. Valencia, and D. G. Pisano. 2013. RUbioSeq: A suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses. Bioinformatics 29, 13, 1687--1689.Google ScholarGoogle ScholarCross RefCross Ref
  127. Nicole Rusk. 2009. Cheap third-generation sequencing. Nature Methods 6, 4, 244--244.Google ScholarGoogle ScholarCross RefCross Ref
  128. Nicole Rusk. 2011. Torrents of sequence. Nature Methods 8, 1, 44--44.Google ScholarGoogle Scholar
  129. Frederick Sanger, Steven Nicklen, and Alan R. Coulson. 1977. DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences 74, 12, 5463--5467.Google ScholarGoogle ScholarCross RefCross Ref
  130. Christopher T. Saunders, Wendy S. W. Wong, Sajani Swamy, Jennifer Becq, Lisa J. Murray, and R. Keira Cheetham. 2012. Strelka: Accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics (Oxford, England) 28, 14, 1811--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Eric E. Schadt, Steve Turner, and Andrew Kasarskis. 2010. A window into third-generation sequencing. Human Molecular Genetics 19, R2 (2010), R227--R240.Google ScholarGoogle ScholarCross RefCross Ref
  132. Michael C. Schatz. 2009. CloudBurst: Highly sensitive read mapping with MapReduce. Bioinformatics (Oxford, England) 25, 11, 1363--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Stephan C. Schuster. 2007. Next-generation sequencing transforms today’s biology. Nature 200, 8, 16--18.Google ScholarGoogle Scholar
  134. Jana Marie Schwarz, Christian Rödelsperger, Markus Schuelke, and Dominik Seelow. 2010. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods 7, 8, 575--576.Google ScholarGoogle ScholarCross RefCross Ref
  135. Jay Shendure and Hanlee Ji. 2008. Next-generation DNA sequencing. Nature Biotechnology 26, 10, 1135--1145.Google ScholarGoogle ScholarCross RefCross Ref
  136. C. Sloggett, N. Goonasekera, and E. Afgan. 2013. BioBlend: Automating pipeline analyses within Galaxy and CloudMan. Bioinformatics 29, 13, 1685--1686.Google ScholarGoogle ScholarCross RefCross Ref
  137. L. F. Stead, K. M. Sutton, G. R. Taylor, P. Quirke, and P. Rabbitts. 2013. Accurately identifying low-allelic fraction variants in single samples with next-generation sequencing: Applications in tumor subclone resolution. Hum. Mutat. 34, 10, 1432--1438.Google ScholarGoogle ScholarCross RefCross Ref
  138. Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, Roy H. Campbell, Chengxiang Zhai, Miles J. Efron, Ravishankar Iyer, Michael C. Schatz, Saurabh Sinha, and Gene E. Robinson. 2015. Big data: Astronomical or genomical? PLoS Biology 13, 7, e1002195.Google ScholarGoogle ScholarCross RefCross Ref
  139. Bianca Stöcker, Johannes Köster, and Sven Rahmann. 2016. SimLoRD--Simulation of long read data. Bioinformatics 32, 17 (2016), 2704--2706.Google ScholarGoogle ScholarCross RefCross Ref
  140. Michael R. Stratton, Peter J. Campbell, and P. Andrew Futreal. 2009. The cancer genome. Nature 458, 7239, 719--724.Google ScholarGoogle Scholar
  141. Peter H. Sudmant, Tobias Rausch, Eugene J. Gardner, Robert E. Handsaker, Alexej Abyzov, John Huddleston, Yan Zhang, Kai Ye, Goo Jun, Markus Hsi-Yang Fritz, Miriam K. Konkel, Ankit Malhotra, Adrian M. Stütz, Xinghua Shi, Francesco Paolo Casale, Jieming Chen, Fereydoun Hormozdiari, Gargi Dayama, Ken Chen, Maika Malig, Mark J. P. Chaisson, Klaudia Walter, Sascha Meiers, Seva Kashin, Erik Garrison, Adam Auton, Hugo Y. K. Lam, Xinmeng Jasmine Mu, Can Alkan, Danny Antaki, Taejeong Bae, Eliza Cerveira, Peter Chines, Zechen Chong, Laura Clarke, Elif Dal, Li Ding, Sarah Emery, Xian Fan, Madhusudan Gujral, Fatma Kahveci, Jeffrey M. Kidd, Yu Kong, Eric-Wubbo Lameijer, Shane McCarthy, Paul Flicek, Richard A. Gibbs, Gabor Marth, Christopher E. Mason, Androniki Menelaou, Donna M. Muzny, Bradley J. Nelson, Amina Noor, Nicholas F. Parrish, Matthew Pendleton, Andrew Quitadamo, Benjamin Raeder, Eric E. Schadt, Mallory Romanovitch, Andreas Schlattl, Robert Sebra, Andrey A. Shabalin, Andreas Untergasser, Jerilyn A. Walker, Min Wang, Fuli Yu, Chengsheng Zhang, Jing Zhang, Xiangqun Zheng-Bradley, Wanding Zhou, Thomas Zichner, Jonathan Sebat, Mark A. Batzer, Steven A. McCarroll, Ryan E. Mills, Mark B. Gerstein, Ali Bashir, Oliver Stegle, Scott E. Devine, Charles Lee, Evan E. Eichler, Jan O. Korbel, and Jan O. Korbel. 2015. An integrated map of structural variation in 2,504 human genomes. Nature 526, 7571, 75--81.Google ScholarGoogle Scholar
  142. Tamas Szalay and Jene A. Golovchenko. 2015. De novo sequencing and variant calling with Nanopores using PoreSeq. Nature Biotechnology 33, 10, 1087--1091.Google ScholarGoogle ScholarCross RefCross Ref
  143. Y. Tateno, T. Imanishi, S. Miyazaki, K. Fukami-Kobayashi, N. Saitou, H. Sugawara, and T. Gojobori. 2002. DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Research 30, 1, 27--30.Google ScholarGoogle ScholarCross RefCross Ref
  144. GB Editorial Team. 2011. Closure of the NCBI SRA and implications for the long-term future of genomics data storage. 1--3.Google ScholarGoogle Scholar
  145. Helga Thorvaldsdóttir, James T. Robinson, and Jill P. Mesirov. 2013. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Briefings in Bioinformatics 14, 2, 178--192.Google ScholarGoogle ScholarCross RefCross Ref
  146. Erwin L. van Dijk, Hélène Auger, Yan Jaszczyszyn, and Claude Thermes. 2014. Ten years of next-generation sequencing technology. Trends in Genetics 30, 9, 418--426.Google ScholarGoogle ScholarCross RefCross Ref
  147. Yanqing Wang, Fuhai Song, Junwei Zhu, Sisi Zhang, Yadong Yang, Tingting Chen, Bixia Tang, Lili Dong, Nan Ding, Qian Zhang, et al. 2017. GSA: Genome sequence archive. Genomics, Proteomics and Bioinformatics 15, 1 (2017), 14--18.Google ScholarGoogle ScholarCross RefCross Ref
  148. Mick Watson, Marian Thomson, Judith Risse, Richard Talbot, Javier Santoyo-Lopez, Karim Gharbi, and Mark Blaxter. 2015. poRe: An R package for the visualization and analysis of Nanopore sequencing data. Bioinformatics 31, 1, 114--115.Google ScholarGoogle ScholarCross RefCross Ref
  149. Simon J. Watson, Matthijs R. A. Welkers, Daniel P. Depledge, Eve Coulter, Judith M. Breuer, Menno D. de Jong, Paul Kellam, D. D. Richman, E. M. Bunnik, A. Moya, E. Holmes, F. González-Candelas, C. Wang, Y. Mitsuya, B. Gharizadeh, M. Ronaghi, R. W. Shafer, J. Archer, M. S. Braverman, B. E. Taillon, B. Desany, I. James, P. R. Harrigan, M. Lewis, D. L. Robertson, N. Eriksson, L. Pachter, Y. Mitsuya, S-Y. Rhee, C. Wang, B. Gharizadeh, M. Ronaghi, R. W. Shafer, N. Beerenwinkel, J. Archer, G. Baillie, S. J. Watson, P. Kellam, A. Rambaut, D. L. Robertson, K. Nakamura, S. M. Huse, J. A. Huber, H. G. Morrison, M. L. Sogin, D. M. Welch, A. R. Quinian, D. A. Stewart, M. P. Strömberg, G. T. Marth, R. V. Pandey, V. Nolte, J. Boenigk, C. Schlötterer, R. Schmieder, R. Edwards, R. V. Patel, M. Jain, Z. Ning, A. J. Cox, J. C. Mullikin, H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, G. Baillie, M. L. Metzker, and A. McKenna. 2013. Viral population analysis and minority-variant detection using short read next-generation sequencing. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 368, 1614, 20120205.Google ScholarGoogle Scholar
  150. Joachim Weischenfeldt, Orsolya Symmons, François Spitz, and Jan O. Korbel. 2013. Phenotypic impact of genomic structural variation: Insights from and for human disease. Nature Reviews Genetics 14, 2, 125--138.Google ScholarGoogle ScholarCross RefCross Ref
  151. David A. Wheeler, Maithreyan Srinivasan, Michael Egholm, Yufeng Shen, Lei Chen, Amy McGuire, Wen He, Yi-Ju Chen, Vinod Makhijani, G. Thomas Roth, et al. 2008. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 7189, 872--876.Google ScholarGoogle Scholar
  152. K. Wong, T. M. Keane, J. Stalker, and D. J. Adams. 2010. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol. 11, 12, R128.Google ScholarGoogle ScholarCross RefCross Ref
  153. Ka-Chun Wong and Zhaolei Zhang. 2014. SNPdryad: Predicting deleterious non-synonymous human SNPs using only orthologous protein sequences. Bioinformatics (Oxford, England) 30, 8, 1112--1119.Google ScholarGoogle Scholar
  154. Chao Xie, Martti T. Tammi, J. Sebat, B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Månér, H. Massa, M. Walker, M. Chi, N. Navin, R. Lucito, J. Healy, J. Hicks, K. Ye, A. Reiner, T. C. Gilliam, B. Trask, N. Patterson, A. Zetterberg, M. Wigler, A. J. Iafrate, L. Feuk, M. N. Rivera, M. L. Listewnik, P. K. Donahoe, Y. Qi, S. W. Scherer, K. C. Woodwark, G. Cameron, R. Durbin, A. Cox, T. Hubbard, M. Clamp, and W. J. Kent. 2009. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10, 1, 80.Google ScholarGoogle ScholarCross RefCross Ref
  155. Haibin Xu, Xiang Luo, Jun Qian, Xiaohui Pang, Jingyuan Song, Guangrui Qian, Jinhui Chen, and Shilin Chen. 2012. FastUniq: A fast de novo duplicates removal tool for paired short reads. PLoS ONE 7, 12 (2012), e52249.Google ScholarGoogle ScholarCross RefCross Ref
  156. Kai Ye, Marcel H. Schulz, Quan Long, Rolf Apweiler, and Zemin Ning. 2009. Pindel: A pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 21, 2865--2871. Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Ming Yi, Yongmei Zhao, Li Jia, Mei He, Electron Kebebew, and Robert M. Stephens. 2014. Performance comparison of SNP detection tools with Illumina exome sequencing data an assessment using both family pedigree information and sample matched SNP array data. Nucleic Acids Research 42, 12, e101--e101.Google ScholarGoogle ScholarCross RefCross Ref
  158. Yongchao Yongchao Liu and Bertil Schmidt. 2014. CUSHAW2-GPU: Empowering faster gapped short-read alignment using GPU computing. IEEE Design and Test 31, 1, 31--39.Google ScholarGoogle ScholarCross RefCross Ref
  159. S. Yoon, Z. Xuan, V. Makarov, K. Ye, and J. Sebat. 2009. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 19, 9, 1586--1592.Google ScholarGoogle ScholarCross RefCross Ref
  160. Y. William Yu, Deniz Yorukoglu, Jian Peng, and Bonnie Berger. 2015. Quality score compression improves genotyping accuracy. Nature Biotechnology 33, 3, 240--243.Google ScholarGoogle ScholarCross RefCross Ref
  161. Peng Yue, Eugene Melamud, John Moult, P. D. Stenson, E. V. Ball, M. Mort, A. D. Phillips, J. A. Shiel, N. S. Thomas, S. Abeysinghe, M. Krawczak, D. N. Cooper, S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, K. Sirotkin, G. D. Bader, D. Betel, C. W. Hogue, M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori, B. J. Stapley, G. Benoit, N. Daraselia, A. Yuryev, S. Egorov, S. Novichkova, M. K. Halushka, J. B. Fan, K. Bentley, L. Hsie, N. Shen, A. Weder, R. Cooper, R. Lipshutz, and A. Chakravarti. 2006. SNPs3D: Candidate gene and SNP selection for association studies. BMC Bioinformatics 7, 1, 166.Google ScholarGoogle ScholarCross RefCross Ref
  162. Daniel R. Zerbino and Ewan Birney. 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs.Genome Research 18, 5, 821--9.Google ScholarGoogle Scholar
  163. Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller. 2000. A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7, 1--2, 203--214.Google ScholarGoogle ScholarCross RefCross Ref
  164. Qian Zhou, Xiaoquan Su, Anhui Wang, Jian Xu, and Kang Ning. 2013. QC-chain: Fast and holistic quality control method for next-generation sequencing data. PLoS ONE 8, 4, e60234.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. DNA Sequencing Technologies: Sequencing Data Protocols and Bioinformatics Tools

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 52, Issue 5
      September 2020
      791 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3362097
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 September 2019
      • Accepted: 1 June 2019
      • Revised: 1 January 2019
      • Received: 1 December 2017
      Published in csur Volume 52, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format