skip to main content
10.1145/3459930.3469546acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

PriSeT: efficient de novo primer discovery

Published:01 August 2021Publication History

ABSTRACT

Motivation: DNA metabarcoding is commonly used to infer the species composition of environmental samples, whereby a short, homologous DNA sequence is amplified and sequenced from all members of the community. Samples can comprise hundreds of organisms that can be closely or very distantly related. DNA metabarcoding combines polymerase chain reaction (PCR) and next-generation sequencing (NGS), and sequences are taxonomically identified based on their match to a reference database. Ideally, each species of interest would have a unique DNA barcode. This short, variable sequence needs to be flanked by conserved regions that can be used as primer-binding sites. PCR primer pairs would amplify a variable barcode in a broad evolutionary range of taxa. To date, no tools exist that computationally search and analyze the effectiveness of new primer pairs for large unaligned sequence data sets. More specifically we solve the following problem: Given a set of reference sequences R = {R1, R2, ..., Rm}, find a primer set P that allows for a high taxonomic coverage. This goal can be achieved by filtering for frequent primers and ranking by coverage or variation, i.e. the number of unique barcodes for further analysis. Here we present the software PriSeT, an offline primer-discovery tool that is capable of processing large libraries and is robust against mislabeled or low-quality references. It avoids the construction of a multisequence alignment of R. Instead, PriSeT uses encodings of frequent k-mers that allow bit-parallel processing and other optimizations.

Results: We first evaluated PriSeT on references (mostly 18S rRNA genes) from 19 clades covering eukaryotic organisms that are typical for freshwater plankton samples. PriSeT recovered several published primer sets as well as additional, more chemically suitable primer sets. For these new sets, we compared frequency, taxonomic coverage, and amplicon variation with published primer sets. For 11 clades we found de novo primer pairs that cover more taxa than the published ones, and for six clades de novo primers resulted in greater sequence (i.e., DNA barcode) variation. We also applied PriSeT to SARS-CoV-2 genomes and computed 114 new primer pairs with the additional constraint that the sequences have no co-occurrences in closely related taxa. These primer sets would be suitable for empirical testing.

Availability: https://github.com/mariehoffmann/PriSeT

Contact: [email protected]

References

  1. Dennis A. Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and Eric W. Sayers. 2012. GenBank. Nucleic Acids Research 41, D1 (11 2012), D36-D42. Google ScholarGoogle ScholarCross RefCross Ref
  2. MarkBlaxter, Paul Ley, James Garey, Lang Xia Liu, Patsy Scheldeman, Andy Vierstraete, Jacques Vanfleteren, L.Y. Mackey, Mark Dorris, Linda Frisse, J.T. Vida, and William Thomas. 1998. A molecular evolutionary framework for the phylum Nematoda. Nature 392 (03 1998), 71--5. Google ScholarGoogle ScholarCross RefCross Ref
  3. Vittorio Boscaro, Alessia Rossi, Claudia Vannini, Franco Verni, Sergei I. Fokin, and Giulio Petroni. 2017. Strengths and Biases of High-Throughput Sequencing Data in the Characterization of Freshwater Ciliate Microbiomes. Microbial Ecology 73 (2017), 865--875. Issue 4. Google ScholarGoogle ScholarCross RefCross Ref
  4. Kenneth Breslauer, Ronald Frank, Helmut Blöcker, and Luis Marky. 1986. Predicting DNA Duplex Stability from the Base Sequence. Proceedings of the National Academy of Sciences of the United States of America 83 (07 1986), 3746--50. Google ScholarGoogle ScholarCross RefCross Ref
  5. Matthew Cannon, Haikel Bogale, Lindsay Rutt, Michael Humphrys, Poonum Korpe, Priya Duggal, Jacques Ravel, and David Serre. 2018. A high-throughput sequencing assay to comprehensively detect and characterize unicellular eukaryotes and helminths from biological and environmental samples. Microbiome 6 (12 2018). Google ScholarGoogle ScholarCross RefCross Ref
  6. David Richard Clark. 1998. Compact Pat Trees. Ph.D. Dissertation. CAN.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thomas Derrien, Jordi Estellé, Santiago Marco Sola, David G. Knowles, Emanuele Raineri, Roderic Guigó, and Paolo Ribeca. 2012. Fast Computation and Applications of Genome Mappability. PLOS ONE 7, 1 (01 2012), 1--16. Google ScholarGoogle ScholarCross RefCross Ref
  8. Vasco Elbrecht, Thomas W. A. Braukmann, Natalia V. Ivanova, Sean W. J. Prosser, Mehrdad Hajibabaei, Michael Wright, Evgeny V. Zakharov, Paul D. N. Hebert, and Dirk Steinke. 2019. Validation of COI metabarcoding primers for terrestrial arthropods. PeerJ 7 (2019). Google ScholarGoogle ScholarCross RefCross Ref
  9. Paolo Ferragina and Giovanni Manzini. 2000. Opportunistic Data Structures with Applications. Proc FOCS 2000 2000, 390--398. Google ScholarGoogle ScholarCross RefCross Ref
  10. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. From Theory to Practice: Plug and Play with Succinct Data Structures. In 13th International Symposium on Experimental Algorithms, (SEA 2014). 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kenan Hadziavdic, Katrine Lekang, Anders Lanzen, Inge Jonassen, Eric M. Thompson, and Christofer Troedsson. 2014. Characterization of the 18S rRNA Gene for Designing Universal Eukaryote Specific Primers. PLOS ONE 9, 2 (02 2014), 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  12. Christoffer Harder, Regin Rønn, Asker Brejnrod, David Bass, Waleed Abu Al-Soud, and Flemming Ekelund. 2016. Local diversity of heathland Cercozoa explored by in-depth sequencing. The ISME Journal 10 (03 2016). Google ScholarGoogle ScholarCross RefCross Ref
  13. Ben Jia, Xueling Li, Wei Liu, Changde Lu, Xiaoting Lu, Liangxiao Ma, Yuan-Yuan Li, and Chaochun Wei. 2019. GLAPD: Whole Genome Based LAMP Primer Design for a Set of Target Genomes. Frontiers in Microbiology 10 (2019), 2860. Google ScholarGoogle ScholarCross RefCross Ref
  14. Ruslan Kalendar, Bekbolat Khassenov, Yerlan Ramankulov, Olga Samuilova, and Konstantin I. Ivanov. 2017. FastPCR: An in silico tool for fast primer and probe design and advanced sequence analysis. Genomics 109, 3 (2017), 312 -- 319. Google ScholarGoogle ScholarCross RefCross Ref
  15. Thomas Kämpke, Markus Kieninger, and Michael Mecklenburg. 2001. Efficient primer design algorithms. Bioinformatics (Oxford, England) 17 (04 2001), 214--25. Google ScholarGoogle ScholarCross RefCross Ref
  16. Linda Medlin, Hille J. Elwood, Shawn Stickel, and Mitchell L. Sogin. 1988. The characterization of enzymatically amplified eukaryotic 16S-like rRNA-coding regions. Gene 71, 2 (1988), 491 -- 499. Google ScholarGoogle ScholarCross RefCross Ref
  17. Y. Moreno, L. Moreno-Mesonero, I. Amorós, R. Pérez, J.A. Morillo, and J.L. Alonso. 2018. Multiple identification of most important waterborne protozoa in surface water used for irrigation purposes by 18S rRNA amplicon-based metagenomics. International Journal of Hygiene and Environmental Health 221, 1 (2018), 102 -- 111. Google ScholarGoogle ScholarCross RefCross Ref
  18. Christopher Pockrandt, Mai Alzamel, Costas S. Iliopoulos, and Knut Reinert. 2019. GenMap: Fast and Exact Computation of Genome Mappability. bioRxiv (2019). Google ScholarGoogle ScholarCross RefCross Ref
  19. Kirsty F. Smith, Gurjeet S. Kohli, Shauna A. Murray, and Lesley L. Rhodes. 2017. Assessment of the metabarcoding approach for community analysis of benthic-epiphytic dinoflagellates using mock communities. New Zealand Journal of Marine and Freshwater Research 51, 4 (2017), 555--576. Google ScholarGoogle ScholarCross RefCross Ref
  20. Thorsten Stoeck, David Bass, Markus Nebel, Richard Christen, Meredith D. M. Jones, Hans-Werner Breiner, and Thomas A. Richards. 2010. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology 19, s1 (2010), 21--31. Google ScholarGoogle ScholarCross RefCross Ref
  21. Gabor E. Tusnády, István Simon, András Váradi, and Tamás Arányi. 2005. BiSearch: primer-design and search tool for PCR on bisulfite-treated genomes. Nucleic Acids Research 33, 1 (01 2005), e9--e9. Google ScholarGoogle ScholarCross RefCross Ref
  22. Claire Valiente Moro, Olivier Crouzet, Séréna Rasconi, Antoine Thouvenot, Gérard Coffe, Isabelle Batisson, and Jacques Bohatier. 2009. New Design Strategy for Development of Specific Primer Sets for PCR-Based Detection of Chlorophyceae and Bacillariophyceae in Environmental Samples. Applied and Environmental Microbiology 75, 17 (2009), 5729--5733. Google ScholarGoogle ScholarCross RefCross Ref
  23. Sebastiano Vigna. 2008. Broadword Implementation of Rank/Select Queries. In Experimental Algorithms, Catherine C. McGeoch (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 154--168.Google ScholarGoogle Scholar
  24. Joana Amorim Visco, Laure Apothéloz-Perret-Gentil, Arielle Cordonier, Philippe Esling, Loïc Pillet, and Jan Pawlowski. 2015. Environmental Monitoring: Inferring the Diatom Index from Next-Generation Sequencing Data. Environmental Science & Technology 49, 13 (2015), 7597--7605. PMID: 26052741. Google ScholarGoogle ScholarCross RefCross Ref
  25. William A. Walters, J. Gregory Caporaso, Christian L. Lauber, Donna Berg-Lyons, Noah Fierer, and Rob Knight. 2011. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27, 8 (02 2011), 1159--1161. arXiv:https://academic.oup.com/bioinformatics/article-pdf/27/8/1159/17102308/btr087.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jian Ye, George Coulouris, Irena Zaretskaya, Ioana Cutcutache, Steve Rozen, and Thomas Madden. 2012. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform 13:134. BMC bioinformatics 13 (06 2012), 134. Google ScholarGoogle ScholarCross RefCross Ref
  27. Tae-Ho Yoon, Hye-Eun Kang, Chang-Keun Kang, Sang Heon Lee, Do-Hwan Ahn, Hyun Park, and Kim Hyun-Woo. 2016. Development of a cost-effective metabarcoding strategy for analysis of the marine phytoplankton community. PeerJ 4 (2016), e2115. Google ScholarGoogle ScholarCross RefCross Ref
  28. Peng Zhou, Xinglou Yang, Xian-Guang Wang, Ben Hu, Lei Zhang, Wei Zhang, Hao-Rui Si, Yan Zhu, Bei Li, Chao-Lin Huang, Hui-Dong Chen, Jing Chen, Yun Luo, Hua Guo, Ren-Di Jiang, Mei-Qin Liu, Ying Chen, Xu-Rui Shen, and Xi Wang. 2020. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579 (03 2020). Google ScholarGoogle ScholarCross RefCross Ref
  29. Jonas Zimmermann, Regine Jahn, and Birgit Gemeinholzer. 2011. Barcoding diatoms: Evaluation of the V4 subregion on the 18S rRNA gene, including new primers and protocols. Organisms Diversity & Evolution 11 (07 2011), 173--192. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PriSeT: efficient de novo primer discovery

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
          August 2021
          603 pages
          ISBN:9781450384506
          DOI:10.1145/3459930

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 August 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate254of885submissions,29%
        • Article Metrics

          • Downloads (Last 12 months)12
          • Downloads (Last 6 weeks)3

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader