skip to main content
10.1145/2147805.2147862acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Mapping short sequencing reads to distant relatives

Published: 01 August 2011 Publication History

Abstract

Numerous different algorithmic approaches have been developed to map the short-reads produced by next-generation sequencing technologies onto reference genome sequences. When sufficiently close reference genomes do not exist, less rigorous approaches must be taken, as is the case for analysis of diverse environmental samples. We have developed a new suite of data structures and algorithms specifically for the mapping of reads from environmental sequencing projects. A pipeline was developed which can rigorously map reads to genomes with many mismatches between the two. Using 50+ million reads generated from soil samples, we present the results of our performance analysis of our approach.

References

[1]
Alkan, C., Kidd, J. M., Marques-Bonet, T., Aksay, G., Antonacci, F., hormozdiari, F., Kitzman, J. O., Baker, C., Malig, M., Mutlu, O., Sahinalp, S. C., Gibbs, R. A., and Eichler, E. E. 2009. Personalized copy-number and segmental duplication maps using next-generation sequencing. Nat Genet 41, 10, 1061--1067. DOI= 10.1038/ng.437.
[2]
Chen, Y., Souaiaia, T., and Chen, T. 2009. PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics 25, 19 (2009 Oct 1), 2514--2521. DOI= 10.1093/bioinformatics/btp-486.
[3]
David, M., Dzamba, M., Lister, D., Ilie, L., and Brudno, M. 2011. SHRiMP2: Sensitive yet practical short read mapping. Bioinformatics, Epub ahead of print. DOI= 10.1093/bioinformatics/btr046.
[4]
Dawid, W. 2000. Biology and global distribution of myxobacteria in soils. FEMS Microbiol Rev 24, 4 (2000 Oct), 403--427. DOI= 10.1111/j.1574-6976.2000.tb00548.x.
[5]
Eaves, H. L., and Gao, Y. 2009. MOM: maximum oligonucleotide mapping. Bioinformatics 25, 7 (2009 April 1), 969--970. DOI= 10.1093/bioinformatics/btp092.
[6]
Gage, D. J. 2004. Infection and invasion of roots by symbiotic, nitrogen-fixing Rhizobia during nodulation of temperate legumes. Microbiol Mol Biol Rev 68, 2 (June), 280--300. DOI= 10.1128/MMBR.68.2.280-300.2004.
[7]
Gojobori, T., Li, W.-H., and Graur, D. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18, 5, 360--369. DOI= 10.1007/BF01733904.
[8]
Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E. E., and Sahinalp, S. C. 2010. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7, 8, 576--7. DOI= 10.1038/nmeth0810-576.
[9]
Huson, D. H., Auch, A. F., Qi, J., and Schuster, S. C. 2007. MEGAN analysis of metagenomic data. Genome Res 17, 3 (2007 Mar), 377--386. DOI= 10.1101/gr.5969107.
[10]
Janus, L. R., Angeloni, N. L., McCormack, J., Rier, S. T., Tuchman, N. C., and Kelly, J. J. 2005. Elevated atmospheric CO2 alters soil microbial communities associated with tremblng aspen (Populus tremuloides) roots. Microbial Ecol 50, 1 (2005 Jul), 102--109. DOI= 10.1007/s00248-004-0120-9.
[11]
Krause, L, Diaz, N. N., Goesmann, A., Kelley, S., Nattkemper, T. W., Rohwer, F., Edwards, R. A., and Stoye, J. 2008. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res 36, 7 (2008 Apr), 2230--2239. DOI= 10.1093/nar/gkn038.
[12]
Langmead, B., Trapnell, c., Pop, M., and Salzberg, S. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25. DOI= 10.1186/gb-2009-10-3-r25.
[13]
Li, H., and Durbin, R. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 5 (2010 Mar 1), 589--595. DOI= 10.1093/bioinformatics/btp698.
[14]
Li, H., Ruan, J., and Durbin, R. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18, 11 (2008 Nov), 1851--1858. DOI= 10.1101/gr.078212.108.
[15]
Li, R., Yu, C., Li, Y., Lam, T. W., Yiu, S. M., Kristiansen, K., and Wang, J. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25, 15 (2009 Aug 1), 1966--1967. DOI= 10.1093/bioinformatics/btp336.
[16]
Lin, H., Zhang, Z., Zhang, M. Q., Ma, B., and Li, M. 2008. Zoom! zillions of oligos mapped. Bioinformatics 24, 21 (2008 Nov 1), 2431--2437. DOI= 10.1093/bioinformatics/btn416.
[17]
Mardis, E. R. 2011. A decade's perspective on DNA sequencing technology. Nature 470, 7333 (2011 Feb 10), 198--203. DOI= 10.1038/nature09796.
[18]
Metzker, M. L. 2010. Sequencing technologies -- the next generation. Nat Rev Genet 22, 1 (2010 Jan 11), 31--46. DOI= 10.1038/nrg2626.
[19]
Meyer, F., Paarmann, D., D'Souza, M., Olson, R., Glass, E. M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., and Edwards, R. A. 2008. The metagenomics RAST server -- a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386. DOI= 10.1186/1471-2105-9-386.
[20]
Schatz, M. C. 2009. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 11 (2009 Jun 1), 1363--1369. DOI= 10.1093/bioinformatics/btp236.
[21]
Smith, A. D., Chung, W.-Y., Hodges, E., Kendall, J., Hannon, G., Hicks, J., Xuan, Z., and Zhang, M. Q. 2009. Updates to the RMAP short-read mapping software. Bioinformatics 25, 21 (2009 Nov 1), 2841--2842. DOI= 10.1093/bioinformatics/btp533.
[22]
Smith, A. D., Xuan, Z., and Zhang, M. Q. 2008. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128. DOI= 10.1186/1471-2105-9-128.
[23]
Trapnell, C., and Salzberg, S. L. 2009. How to map billions of short reads onto genomes. Nat Biotechnol 27, 5 (May 2009) 455--457. DOI= 10.1038/nbt0509-455.
[24]
Wood, D. L. A., Xu, Q., Pearson, J. V., Cloonan, N., and Grimmond, S. M. 2011. X-MATE: a flexible system for mapping short read data. Bioinformatics 27, 4 (2011 Feb 15), 580--581. DOI= 10.1093/bioinformatics/btq698.

Cited By

View all

Index Terms

  1. Mapping short sequencing reads to distant relatives

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
        August 2011
        688 pages
        ISBN:9781450307963
        DOI:10.1145/2147805
        • General Chairs:
        • Robert Grossman,
        • Andrey Rzhetsky,
        • Program Chairs:
        • Sun Kim,
        • Wei Wang
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 August 2011

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. metagenomic sequence analysis
        2. short-read mapping

        Qualifiers

        • Short-paper

        Conference

        BCB' 11
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 254 of 885 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 19 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media