Abstract
A dictionary-based bacterial genome analysis is performed, through specific k-long factors (called res) and their maximal right elongation along the genome (called spectral segment), in order to find discriminating biomarkers at the genus and species level. The aim is pursued through a k-mer-based approach previously introduced, here applied on genomes of different bacterial taxa. Intervals for values of k are identified to obtain meaningful genomic fragments, whose collection is a suitable representation to compare genomes according to informational indexes and Jaccard’s similarity matrices. Corresponding dictionaries of k-mers are identified to discriminate bacterial genomes at genus and species level. This approach appears competitive in terms of performance (e.g., species discrimination) and size with respect to traditional barcoding methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Berstel, J., Karhumäki, J.: Combinatorics on words-a tutorial. current trends in theoretical computer science. Challenge New Century 2, 415–475 (2004)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Bonnici, V., Manca, V.: Infogenomics tools: A computational suite for informational analysis of genomes. J. Bioinforma Proteomics Rev. 1, 8–14 (2015)
Bonnici, V., Franco, G., Manca, V.: Spectral concepts in genome informational analysis. Theoret. Comput. Sci. 894, 23–30 (2021)
Cairo, M., Rizzi, R., Tomescu, A.I., Zirondelli, E.C.: Genome assembly, from practice to theory: safe, complete and linear-time. arXiv preprint arXiv:2002.10498 (2020)
Castellini, A., Franco, G., Manca, V.: A dictionary based informational genome analysis. BMC Genomics 13(1), 1–14 (2012)
Compeau, P.E.C., Pevzner, P.A., Tesler, G.: How to apply de bruijn graphs to genome assembly. Nat. Biotechnol. 29(11), 987–991 (2011)
Compeau, P.E.C., Pevzner, P.A., Tesler, G.: Why are de bruijn graphs useful for genome assembly? Nat. Biotechnol. 29(11), 987 (2011)
De Luca, A.: On the combinatorics of finite words. Theoret. Comput. Sci. 218(1), 13–39 (1999)
DeSalle, R., Goldstein, P.: Review and interpretation of trends in DNA barcoding. Front. Ecol. Evol. 7, 302 (2019)
Franco, G.: Perspectives in computational genome analysis. In: Jonoska, N., Saito, M. (eds.) Discrete and Topological Models in Molecular Biology. NCS, pp. 3–22. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-40193-0_1
Goldstein, P.Z., DeSalle, R.: Integrating DNA barcode data and taxonomic practice: determination, discovery, and description. Bioessays 33(2), 135–147 (2011)
Hao, B., Qi, J.: Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J. Bioinform. Comput. Biol. 2(01), 1–19 (2004)
Haubold, B., Klötzl, F., Pfaffelhuber, P.: andi: fast and accurate estimation of evolutionary distances between closely related genomes. Bioinformatics 31(8), 1169–1175 (2015)
Holley, G., Melsted, P.: Bifrost: highly parallel construction and indexing of colored and compacted de bruijn graphs. Genome Biol. 21(1), 1–20 (2020)
Lothaire, M.: Combinatorics on Words, vol. 17. Cambridge University Press, Cambridge (1997)
Manca, V.: The principles of informational genomics. Theoret. Comput. Sci. 701, 190–202 (2017)
Acosta, N.O., Mäkinen, V., Tomescu, A.I.: A safe and complete algorithm for metagenomic assembly. Algorithms Mol. Biol. 13(1), 1–12 (2018)
Orozco-Arias, S., et al.: K-mer-based machine learning method to classify ltr-retrotransposons in plant genomes. PeerJ, 9, e11456 (2021)
Orozco-Arias, S., S Piña, J., Tabares-Soto, R., Castillo-Ossa, L.F., Guyot, R., Isaza, G.: Measuring performance metrics of machine learning algorithms for detecting and classifying transposable elements. Processes 8(6), 638 (2020)
Qi, J., Luo, H., Hao, B.: Cvtree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res. 32(suppl-2), W45–W47 (2004)
Ratnasingham, S., Hebert, P.D.N.: Bold: the barcode of life data system (http://www.barcodinglife.org). Mol. Ecol. Notes 7(3), 355–364 (2007)
Sarmashghi, S., Bohmann, K., Gilbert, M.T.P., Bafna, V., Mirarab, S.: SKMER: assembly-free and alignment-free sample identification using genome skims. Genome Biol. 20(1), 1–20 (2019)
Thomas, T., Gilbert, J., Meyer, F.: Metagenomics-a guide from sampling to data analysis. Microb. Inf. Exp. 2(1), 1–12 (2012)
Tomescu, A.I., Medvedev, P.: Safe and complete contig assembly through OMNITIGS. J. Comput. Biol. 24(6), 590–602 (2017)
Vinga, S., Almeida, J.: Alignment-free sequence comparison-a review. Bioinformatics 19(4), 513–523 (2003)
Wittler, R.: Alignment and reference-free phylogenomics with colored de bruijn graphs. Algorithms Mol. Biol. 15(1), 1–12 (2020)
Yen, S., Johnson, J.S.: Metagenomics: a path to understanding the gut microbiome. Mamm. Genome 32(4), 282–296 (2021). https://doi.org/10.1007/s00335-021-09889-x
Yi, H., Jin, L.: Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucleic Acids Res. 41(7), e75–e75 (2013)
Zielezinski, A., et al.: Benchmarking of alignment-free sequence comparison methods. Genome Biol. 20(1), 1–18 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Astorino, S., Bonnici, V., Franco, G. (2023). An Investigation to Test Spectral Segments as Bacterial Biomarkers. In: Genova, D., Kari, J. (eds) Unconventional Computation and Natural Computation. UCNC 2023. Lecture Notes in Computer Science, vol 14003. Springer, Cham. https://doi.org/10.1007/978-3-031-34034-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-34034-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34033-8
Online ISBN: 978-3-031-34034-5
eBook Packages: Computer ScienceComputer Science (R0)