Abstract
We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst. Inf. Dyn. 5, 265–278 (1998)
Gorban, A.N., Popova, T.G., Sadovsky, M.G., Wünsch, D.C.: Information content of the frequency dictionaries, reconstruction, transformation, classification of dictionaries, genetic texts. In: Intelligent Engineering Systems through Artificial Neural Networks. Smart Engineering System Design, vol. 11, pp. 657–663. ASME Press, New York (2001)
Gorban, A.N., Popova, T.G., Sadovsky, M.G.: Classification of symbol sequences over thier frequency dictionaries: towards the connection between structure and natural taxonomy. Open Syst. Inf. Dyn. 7, 1–17 (2000)
Sadovsky, M.G., Shchepanovsky, A.S., Putintzeva, Y.A.: Genes, information and sense: complexity and knowledge retrieval. Theory Biosci. 127, 69–78 (2008)
Sadovsky, M.G.: Comparison of real frequencies of strings vs. the expected ones reveals the information capacity of macromoleculae. J. Biol. Phys. 29, 23–38 (2003)
Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biol. 68, 156–178 (2006)
Garcia S.P., Pinho A.J.: Minimal absent words in four human genome assemblies. PLoS One 6(12), e29344 (2011)
Alileche, A., Goswami, J., Bourland, W., Davis, M., Hampikian, G.: Nullomer derived anticancer peptides (NulloPs): differential lethal effects on normal and cancer cells in vitro. Peptides 38, 302–311 (2012)
Acquisti, C., Poste, G., Curtiss, D., Kumar, S.: Nullomers: really a matter of natural selection? PLoS One 10, e1022 (2007)
Aurell, E., Innocenti, N., Zhou, H.-J.: The Bulk and The Tail of Minimal Absent Words in Genome Sequences (2015). arXiv:1509.05188v1
Rahman, M.S., Alatabbi, A., Athar, T., Crochemore, M., Rahman, M.S.: Absent words and the (dis)similarity analysis of DNA sequences: an experimental study. BMC Res. Notes 9, 186 (2016)
Garcia, S.P., Pinho, A.P., Rodrigues, J., Bastos, C.A.C., Ferreira, P.: Minimal absent words in prokaryotic, eukaryotic genomes. PLoS One 6(1), e16065 (2011)
Hao, B., Xie, H., Zuguo, Y., Chen, G.: Avoided strings in bacterial complete genomes and a related combinatorial problem. Ann. Comb. 4, 247–255 (2000)
Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theoret. Comput. Sci. 450, 109–116 (2012)
Gelfand, M.S., Koonin, E.V.: Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucleic Acids Res. 25, 2430–2439 (1997)
Fuglsang, A.: Distribution of potential type II restriction sites (palindromes) in prokaryotes. Biochem. Biophys. Res. Commun. 310(2), 280–285 (2003)
Roberts, R.J., Vincze, T., Posfai, J., Macelis, D.: REBASE-a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015)
Acknowledgement
This study was supported by a research grant # 14.Y26.31.0004 from the Government of the Russian Federation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sadovsky, M., Fontaine, JF., Andrade-Navarro, M.A., Yakubailik, Y., Rudenko, N. (2017). Lost Strings in Genomes: What Sense Do They Make?. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)