Abstract
Primer and adapter sequences are synthetic DNA or RNA oligonucleotides used in the process of amplification and sequencing. In theory, while similar primer sequences can be present on assembled genomes, adapter sequences should be trimmed (filtered) and, hence, absent from assembled genomes. However, given ambiguity problems, inefficient parameterization of trimming tools, and others, uncommonly they can be found in assembled genomes, on an exact or approximate state. In this paper, we investigate the occurrence of exact and approximate primer-adapter subsequences in assembled and, specifically, in the whole archaeal genomes of the NCBI database. We present a new method that combines data compression with custom signal processing operations, namely filtering and segmentation, to localize and visualize these regions given a defined similarity threshold. The program is freely available, under GPLv3 license, at https://github.com/pratas/maple.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pereira, F., Carneiro, J., Amorim, A.: Identification of species with DNA-based technology: current progress and challenges. Recent. Pat. DNA Gene Seq. 2(3), 187–200 (2008)
Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30(15), 2114–2120 (2014)
Schubert, M., Lindgreen, S., Orlando, L.: AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 9(1), 88 (2016)
Criscuolo, A., Brisse, S.: AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. Genomics 102(5), 500–506 (2013)
Li, J.W., Bolser, D., Manske, M., Giorgi, F.M., Vyahhi, N., Usadel, B., Clavijo, B.J., Chan, T.F., Wong, N., Zerbino, D., et al.: The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability. Brief. Bioinform. 14(5), 548–555 (2013)
Church, D., Deanna, M., Schneider, V., et al.: Modernizing reference genome assemblies. PLoS Biol. 9(7), e1001091 (2011)
Hosseini, M., Pratas, D., Pinho, A.J.: On the role of inverted repeats in DNA sequence similarity. In: PACBB-2017, pp. 228–236 (2017)
Numanagić, I., Gökkaya, A.S., Zhang, L., Berger, B., Alkan, C., Hach, F.: Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34(17), i706–i714 (2018)
Afreixo, V., Bastos, C.A.C., Pinho, A.J., Garcia, S.P., Ferreira, P.J.S.G.: Genome analysis with inter-nucleotide distances. Bioinformatics 25(23), 3064–3070 (2009)
Bastos, C.A., Afreixo, V., Rodrigues, J.M., Pinho, A.J.: An analysis of symmetric words in human DNA: adjacent vs non-adjacent word distances. In: PACBB-2018, pp. 80–87 (2018)
Tavares, A.H., Pinho, A.J., Silva, R.M., Rodrigues, J.M., Bastos, C.A., Ferreira, P.J., Afreixo, V.: DNA word analysis based on the distribution of the distances between symmetric words. Sci. Rep. 7(1), 728 (2017)
Alkan, C., Sajjadian, S., Eichler, E.E.: Limitations of next-generation genome sequence assembly. Nat. Methods 8(1), 61 (2010)
Pratas, D.: Compression and analysis of genomic data. Ph.D. thesis, University of Aveiro (2016)
Wandelt, S., Leser, U.: FRESCO: referential compression of highly similar sequences. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(5), 1275–1288 (2013)
Ochoa, I., Hernaez, M., Weissman, T.: iDoComp: a compression scheme for assembled genomes. Bioinformatics 31, 626–633 (2014)
Deorowicz, S., Danek, A., Niemiec, M.: GDC 2: compression of large collections of genomes. Sci. Rep. 5(11565), 1–12 (2015)
Pratas, D., Pinho, A.J., Ferreira, P.J.S.G.: Efficient compression of genomic sequences. In: DCC-2016, Snowbird, Utah, pp. 231–240 (2016)
Liu, Y., Peng, H., Wong, L., Li, J.: High-speed and high-ratio referential genome compression. Bioinformatics 33(21), 3364–3372 (2017)
Pratas, D., Silva, R.M., Pinho, A.J.: Comparison of compression-based measures with application to the evolution of primate genomes. Entropy 20(6), 393 (2018)
Pratas, D., Hosseini, M., Pinho, A.J.: Substitutional tolerant Markov models for relative compression of DNA sequences. In: PACBB-2017, pp. 265–272 (2017)
Crochemore, M., Ilie, L., Rytter, W.: Repetitions in strings: algorithms and combinatorics. Theor. Comput. Sci. 410(50), 5227–5235 (2009)
Pratas, D., Pinho, A.J., Rodrigues, J.M.O.S.: XS: a FASTQ read simulator. BMC Res. Notes 7(1), 40 (2014)
Pratas, D., Pinho, A.J., Silva, R.M., Rodrigues, J.M.O.S., Hosseini, M., Caetano, T., Ferreira, P.J.S.G.: FALCON-meta: a method to infer metagenomic composition of ancient DNA. bioRxiv 267179 (2018)
Garcia, S.P., Rodrigues, J.M.O.S., Santos, S., Pratas, D., Afreixo, V., Bastos, C.A.C., Ferreira, P.J.S.G., Pinho, A.J.: A genomic distance for assembly comparison based on compressed maximal exact matches. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(3), 793–798 (2013)
Acknowledgements
This work was partially funded by FEDER (Programa Operacional Factores de Competitividade - COMPETE) and by National Funds through the FCT, in the context of the projects UID/CEC/00127/2019 & PTCD/EEI-SII/6608/2014 and the grant PD/BD/113969/2015 to MH.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pratas, D., Hosseini, M., Pinho, A.J. (2020). Visualization of Similar Primer and Adapter Sequences in Assembled Archaeal Genomes. In: Fdez-Riverola, F., Rocha, M., Mohamad, M., Zaki, N., Castellanos-Garzón, J. (eds) Practical Applications of Computational Biology and Bioinformatics, 13th International Conference. PACBB 2019. Advances in Intelligent Systems and Computing, vol 1005 . Springer, Cham. https://doi.org/10.1007/978-3-030-23873-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-23873-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23872-8
Online ISBN: 978-3-030-23873-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)