ABSTRACT
Motivation: DNA metabarcoding is commonly used to infer the species composition of environmental samples, whereby a short, homologous DNA sequence is amplified and sequenced from all members of the community. Samples can comprise hundreds of organisms that can be closely or very distantly related. DNA metabarcoding combines polymerase chain reaction (PCR) and next-generation sequencing (NGS), and sequences are taxonomically identified based on their match to a reference database. Ideally, each species of interest would have a unique DNA barcode. This short, variable sequence needs to be flanked by conserved regions that can be used as primer-binding sites. PCR primer pairs would amplify a variable barcode in a broad evolutionary range of taxa. To date, no tools exist that computationally search and analyze the effectiveness of new primer pairs for large unaligned sequence data sets. More specifically we solve the following problem: Given a set of reference sequences R = {R1, R2, ..., Rm}, find a primer set P that allows for a high taxonomic coverage. This goal can be achieved by filtering for frequent primers and ranking by coverage or variation, i.e. the number of unique barcodes for further analysis. Here we present the software PriSeT, an offline primer-discovery tool that is capable of processing large libraries and is robust against mislabeled or low-quality references. It avoids the construction of a multisequence alignment of R. Instead, PriSeT uses encodings of frequent k-mers that allow bit-parallel processing and other optimizations.
Results: We first evaluated PriSeT on references (mostly 18S rRNA genes) from 19 clades covering eukaryotic organisms that are typical for freshwater plankton samples. PriSeT recovered several published primer sets as well as additional, more chemically suitable primer sets. For these new sets, we compared frequency, taxonomic coverage, and amplicon variation with published primer sets. For 11 clades we found de novo primer pairs that cover more taxa than the published ones, and for six clades de novo primers resulted in greater sequence (i.e., DNA barcode) variation. We also applied PriSeT to SARS-CoV-2 genomes and computed 114 new primer pairs with the additional constraint that the sequences have no co-occurrences in closely related taxa. These primer sets would be suitable for empirical testing.
Availability: https://github.com/mariehoffmann/PriSeT
Contact: [email protected]
- Dennis A. Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and Eric W. Sayers. 2012. GenBank. Nucleic Acids Research 41, D1 (11 2012), D36-D42. Google ScholarCross Ref
- MarkBlaxter, Paul Ley, James Garey, Lang Xia Liu, Patsy Scheldeman, Andy Vierstraete, Jacques Vanfleteren, L.Y. Mackey, Mark Dorris, Linda Frisse, J.T. Vida, and William Thomas. 1998. A molecular evolutionary framework for the phylum Nematoda. Nature 392 (03 1998), 71--5. Google ScholarCross Ref
- Vittorio Boscaro, Alessia Rossi, Claudia Vannini, Franco Verni, Sergei I. Fokin, and Giulio Petroni. 2017. Strengths and Biases of High-Throughput Sequencing Data in the Characterization of Freshwater Ciliate Microbiomes. Microbial Ecology 73 (2017), 865--875. Issue 4. Google ScholarCross Ref
- Kenneth Breslauer, Ronald Frank, Helmut Blöcker, and Luis Marky. 1986. Predicting DNA Duplex Stability from the Base Sequence. Proceedings of the National Academy of Sciences of the United States of America 83 (07 1986), 3746--50. Google ScholarCross Ref
- Matthew Cannon, Haikel Bogale, Lindsay Rutt, Michael Humphrys, Poonum Korpe, Priya Duggal, Jacques Ravel, and David Serre. 2018. A high-throughput sequencing assay to comprehensively detect and characterize unicellular eukaryotes and helminths from biological and environmental samples. Microbiome 6 (12 2018). Google ScholarCross Ref
- David Richard Clark. 1998. Compact Pat Trees. Ph.D. Dissertation. CAN.Google ScholarDigital Library
- Thomas Derrien, Jordi Estellé, Santiago Marco Sola, David G. Knowles, Emanuele Raineri, Roderic Guigó, and Paolo Ribeca. 2012. Fast Computation and Applications of Genome Mappability. PLOS ONE 7, 1 (01 2012), 1--16. Google ScholarCross Ref
- Vasco Elbrecht, Thomas W. A. Braukmann, Natalia V. Ivanova, Sean W. J. Prosser, Mehrdad Hajibabaei, Michael Wright, Evgeny V. Zakharov, Paul D. N. Hebert, and Dirk Steinke. 2019. Validation of COI metabarcoding primers for terrestrial arthropods. PeerJ 7 (2019). Google ScholarCross Ref
- Paolo Ferragina and Giovanni Manzini. 2000. Opportunistic Data Structures with Applications. Proc FOCS 2000 2000, 390--398. Google ScholarCross Ref
- Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. From Theory to Practice: Plug and Play with Succinct Data Structures. In 13th International Symposium on Experimental Algorithms, (SEA 2014). 326--337. Google ScholarDigital Library
- Kenan Hadziavdic, Katrine Lekang, Anders Lanzen, Inge Jonassen, Eric M. Thompson, and Christofer Troedsson. 2014. Characterization of the 18S rRNA Gene for Designing Universal Eukaryote Specific Primers. PLOS ONE 9, 2 (02 2014), 1--10. Google ScholarCross Ref
- Christoffer Harder, Regin Rønn, Asker Brejnrod, David Bass, Waleed Abu Al-Soud, and Flemming Ekelund. 2016. Local diversity of heathland Cercozoa explored by in-depth sequencing. The ISME Journal 10 (03 2016). Google ScholarCross Ref
- Ben Jia, Xueling Li, Wei Liu, Changde Lu, Xiaoting Lu, Liangxiao Ma, Yuan-Yuan Li, and Chaochun Wei. 2019. GLAPD: Whole Genome Based LAMP Primer Design for a Set of Target Genomes. Frontiers in Microbiology 10 (2019), 2860. Google ScholarCross Ref
- Ruslan Kalendar, Bekbolat Khassenov, Yerlan Ramankulov, Olga Samuilova, and Konstantin I. Ivanov. 2017. FastPCR: An in silico tool for fast primer and probe design and advanced sequence analysis. Genomics 109, 3 (2017), 312 -- 319. Google ScholarCross Ref
- Thomas Kämpke, Markus Kieninger, and Michael Mecklenburg. 2001. Efficient primer design algorithms. Bioinformatics (Oxford, England) 17 (04 2001), 214--25. Google ScholarCross Ref
- Linda Medlin, Hille J. Elwood, Shawn Stickel, and Mitchell L. Sogin. 1988. The characterization of enzymatically amplified eukaryotic 16S-like rRNA-coding regions. Gene 71, 2 (1988), 491 -- 499. Google ScholarCross Ref
- Y. Moreno, L. Moreno-Mesonero, I. Amorós, R. Pérez, J.A. Morillo, and J.L. Alonso. 2018. Multiple identification of most important waterborne protozoa in surface water used for irrigation purposes by 18S rRNA amplicon-based metagenomics. International Journal of Hygiene and Environmental Health 221, 1 (2018), 102 -- 111. Google ScholarCross Ref
- Christopher Pockrandt, Mai Alzamel, Costas S. Iliopoulos, and Knut Reinert. 2019. GenMap: Fast and Exact Computation of Genome Mappability. bioRxiv (2019). Google ScholarCross Ref
- Kirsty F. Smith, Gurjeet S. Kohli, Shauna A. Murray, and Lesley L. Rhodes. 2017. Assessment of the metabarcoding approach for community analysis of benthic-epiphytic dinoflagellates using mock communities. New Zealand Journal of Marine and Freshwater Research 51, 4 (2017), 555--576. Google ScholarCross Ref
- Thorsten Stoeck, David Bass, Markus Nebel, Richard Christen, Meredith D. M. Jones, Hans-Werner Breiner, and Thomas A. Richards. 2010. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology 19, s1 (2010), 21--31. Google ScholarCross Ref
- Gabor E. Tusnády, István Simon, András Váradi, and Tamás Arányi. 2005. BiSearch: primer-design and search tool for PCR on bisulfite-treated genomes. Nucleic Acids Research 33, 1 (01 2005), e9--e9. Google ScholarCross Ref
- Claire Valiente Moro, Olivier Crouzet, Séréna Rasconi, Antoine Thouvenot, Gérard Coffe, Isabelle Batisson, and Jacques Bohatier. 2009. New Design Strategy for Development of Specific Primer Sets for PCR-Based Detection of Chlorophyceae and Bacillariophyceae in Environmental Samples. Applied and Environmental Microbiology 75, 17 (2009), 5729--5733. Google ScholarCross Ref
- Sebastiano Vigna. 2008. Broadword Implementation of Rank/Select Queries. In Experimental Algorithms, Catherine C. McGeoch (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 154--168.Google Scholar
- Joana Amorim Visco, Laure Apothéloz-Perret-Gentil, Arielle Cordonier, Philippe Esling, Loïc Pillet, and Jan Pawlowski. 2015. Environmental Monitoring: Inferring the Diatom Index from Next-Generation Sequencing Data. Environmental Science & Technology 49, 13 (2015), 7597--7605. PMID: 26052741. Google ScholarCross Ref
- William A. Walters, J. Gregory Caporaso, Christian L. Lauber, Donna Berg-Lyons, Noah Fierer, and Rob Knight. 2011. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27, 8 (02 2011), 1159--1161. arXiv:https://academic.oup.com/bioinformatics/article-pdf/27/8/1159/17102308/btr087.pdf Google ScholarDigital Library
- Jian Ye, George Coulouris, Irena Zaretskaya, Ioana Cutcutache, Steve Rozen, and Thomas Madden. 2012. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform 13:134. BMC bioinformatics 13 (06 2012), 134. Google ScholarCross Ref
- Tae-Ho Yoon, Hye-Eun Kang, Chang-Keun Kang, Sang Heon Lee, Do-Hwan Ahn, Hyun Park, and Kim Hyun-Woo. 2016. Development of a cost-effective metabarcoding strategy for analysis of the marine phytoplankton community. PeerJ 4 (2016), e2115. Google ScholarCross Ref
- Peng Zhou, Xinglou Yang, Xian-Guang Wang, Ben Hu, Lei Zhang, Wei Zhang, Hao-Rui Si, Yan Zhu, Bei Li, Chao-Lin Huang, Hui-Dong Chen, Jing Chen, Yun Luo, Hua Guo, Ren-Di Jiang, Mei-Qin Liu, Ying Chen, Xu-Rui Shen, and Xi Wang. 2020. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579 (03 2020). Google ScholarCross Ref
- Jonas Zimmermann, Regine Jahn, and Birgit Gemeinholzer. 2011. Barcoding diatoms: Evaluation of the V4 subregion on the 18S rRNA gene, including new primers and protocols. Organisms Diversity & Evolution 11 (07 2011), 173--192. Google ScholarCross Ref
Index Terms
- PriSeT: efficient de novo primer discovery
Recommendations
HmmUFOtu: An HMM and Phylogenetic Placement based Ultra-fast Taxonomic Assignment and OTU Picking Tool for Microbiome Amplicon Sequencing Studies
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsOver the last decade, joint advances in next-generation sequencing technology and bioinformatics pipelines have dramatically improved our understanding of host-associated and environmental microbiota. Standard microbiome community analysis typically ...
Algorithmic applications of XPCR
An emerging trend in DNA computing consists of the algorithmic analysis of new molecular biology technologies, and in general of more effective tools to tackle computational biology problems. An algorithmic understanding of the interaction between DNA ...
A Primer Design Algorithm for Global Analysis of CpG Methylation
IJCBS '09: Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent ComputingGlobal DNA methylation changes have been documented to occur during normal development, aging and disease progression. Genome-wide levels of methylation are commonly inferred from the average level of methylation of large sets of repetitive elements. ...
Comments