skip to main content
10.1145/2808719.2811456acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Investigating genome similarity through cross mapping percentage

Published:09 September 2015Publication History

ABSTRACT

A necessary step in many metagenomic studies is to determine organisms present in a sample. Knowledge of the similarity among genomes of present organisms allows for more accurate mapping of high throughput sequencing reads to the correct genome for expression quantification. This study investigates current metrics of genome similarity as they relate to cross mapping percentage, defined as the percentage of sequence reads from one organism mapping to another organism's genome. This study aims to establish a new metric for genome similarity, incorporating cross mapping percentage. Paired-end reads were generated using Artificial FASTQ Generator (AFG), for 10 organisms fitting into two categories -- host and pathogen. The reads were mapped to reference genomes and the cross mapping percentage was calculated using Bowtie2. Bowtie2 produced higher values for organisms with a lower calculated genomic distance, which led to the conclusion that hosts and pathogens could easily be distinguished, while pathogens and other microbial genomes themselves were harder to separate. The genomes were aligned using MUMmer and an overall percent similarity between the sequences was determined. A metric for genome similarity was established by modifying formulas calculated within DSMZ's Genome-to-Genome Distance Calculator (GGDC) to incorporate cross mapping percentages. Formula manipulation did not change the trend present in genomic distance values which supports that cross mapping percentage, distance calculated with the original formulas and distance calculated with the new formulas are interchangeable. This work helps establish at what resolution organisms in a sample can be distinguished using whole genome sequence information. That is, how similar organisms can be and still be distinguished in a metagenomic study for the purposes of computing expression values. These findings allow for organisms in metagenomic studies to be better identified and an accurate quantification of expression computed in metatranscriptomic studies.

References

  1. Frampton, Matthew, and Richard Houlston. "Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines." PLoS ONE 7.11 (2012): Web. 29 July 2015.Google ScholarGoogle ScholarCross RefCross Ref
  2. Kurtz, Stefan et al., "Versatile and Open Software for Comparing Large Genomes." Genome Biology 5.2 (2004): Web. 29 July 2015.Google ScholarGoogle ScholarCross RefCross Ref
  3. Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357--359.Google ScholarGoogle Scholar
  4. Meier-Kolthoff, J. P., Auch, A. F., Klenk, H.-P., Göker, M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  5. Monaco MK et al.,(2014). Gramene 2013: comparative plant genomics resources. Nucleic Acids Res. 42 (D1): D1193--D1199. PMID:24217918. doi: 10.1093/nar/gkt1110.Google ScholarGoogle ScholarCross RefCross Ref
  6. National Center for Biotechnology Information (NCBI). Web. http://www.ncbi.nlm.nih.gov/nuccore.Google ScholarGoogle Scholar

Index Terms

  1. Investigating genome similarity through cross mapping percentage

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
          September 2015
          683 pages
          ISBN:9781450338530
          DOI:10.1145/2808719

          Copyright © 2015 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 September 2015

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          BCB '15 Paper Acceptance Rate48of141submissions,34%Overall Acceptance Rate254of885submissions,29%
        • Article Metrics

          • Downloads (Last 12 months)1
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader