Abstract
Genomic Information Systems (GenISs) have been recently proposed to provide a universal framework for feature extraction, dimensionality reduction and more effective processing of genomic data. They are based on methodologies more anchored in biochemical reality and exploit newly discovered structure of DNA spaces to extract and represent genomic data in compact data structures rich enough to answer critical questions about the original organisms, including phylogenies, species identification and, more recently, phenotypic information. They work from just DNA sequence alone (possibly including full genomes), in a matter of minutes or hours, and produce answers consistent with well-established and accepted biological knowledge. Here, we introduce a second family of GenISs based on further structural properties of DNA spaces and demonstrate that they could also be used to provide principled, general and intuitive solutions to fundamental questions in biology such as “What exactly is a biological species?” Current answers to these all important questions have remained dependent on specific taxa and subject to analyst choices. We further discuss other applications to be explored in the future, including universal biological taxonomies in the quest for a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adleman, L.: Molecular computation of solutions of combinatorial problems. Science 266, 1021–1024 (1994)
Arthur, D., Vassilvitskii, S: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Colorado-Garzón, F.A., Adler, P.H., García, L.F., Muñoz de Hoyos, P., Bueno, M.L., Matta, N.E.: Estimating diversity of black flies in the simulium ignescens and simulium tunja complexes in Colombia: chromosomal rearrangements as the core of integrative taxonomy. J. Heredity 108(1), 12–24 (2017)
de Queiroz, K.: Species concepts and species delimitation. Syst. Biol. 56(6), 879–886 (2007)
de Queiroz, K.: Ernst Mayr and the modern concept of species. Proc. Nat. Acad. Sci. 102(suppl 1), 6600–6607 (2005)
Deaton, J., Chen, J., Garzon, M., Wood, D.H.: Test Tube Selection of Large Independent Sets of DNA Oligonucleotides, pp. 152–166. World Publishing Co., Singapore (2004). (Volume dedicated to Ned Seeman on occasion of his 60th birthday)
Garzon, M.H., Mainali, S.: Towards reliable microarray analysis and design. In: The 9th BiCOB-International Conference on Bioinformatics and Computational Biology. International Society for Computational and their Applications ISCA (2017a). 6pp.
Garzon, M.H., Mainali, S.: Towards a universal genomic positioning system: phylogenetics and species IDentification. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10209, pp. 469–479. Springer, Cham (2017b). https://doi.org/10.1007/978-3-319-56154-7_42
Garzon, M.: DNA codeword design: theory and applications. Parallel Process. Lett. 24(2), 1–21 (2014)
Garzon, M.H., Bobba, K.C.: A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic, D., Turberfield, A. (eds.) DNA 2012. LNCS, vol. 7433, pp. 73–85. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32208-2_6
Garzon, M., Pham, D.: Genomic solutions to hospital-acquired bacterial infection identification. In: Rojas I., Ortuño F. (eds) Bioinformatics and Biomedical Engineering. Proc. IWBBIO 2018. Lecture Notes in Bioinformatics, Part I, vol. 10813, pp. 486–497. Springer-Verlag (2018). https://doi.org/10.1007/978-3-319-78723-7_42
Hartigan, J.A., Wong, M.A.: Algorithm AS 136 A k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Applied Statistics) 28(1), 100–108 (1979)
Hebert, P.D., Cywinska, A., Ball, S.L.: Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B Biol. Sci. 270, 313–321 (2003)
Henning, W.: Phylogenetic Systematics. translated by Davis, D.D., Zangerl, R. University of Illinois Press, Urbana (1966)
Karr, J.R. et al.: A whole-cell computational model predicts phenotype from genotype. Cell 150(2), 389–401 (2012)
von Linnaeus, C.: Systema Naturae, edition X, vol. 1 (Systema naturae per regna tria naturae, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Tomus I. Editio decima, reformata). Salvii Holmiae 1 (1758)
Mainali, S., Colorado, F.A., Garzon, M.H.: Foretelling the phenotype of a genomic sequence. IEEE Trans. Comput. Biol. Bioinform. (2020, under review)
Mayr, E.: Systematics and the Origin of Species. Columbia University Press, New York (1942)
Seeman, N.: DNA in a material world. Nature 421, 427–431 (2003)
Sokal, R.R., Crovello, T.J.: The biological species concept: a critical evaluation. Am. Nat. 104, 127–153 (1970)
Van Valen, L.: Ecological species, multispecies, and oaks. Taxon 25, 233–239 (1976)
Valan, M., Makonyi, K., Maki, A., Vondráček, D., Ronquist, F.: Automated taxonomic identification of insects with expert-level accuracy using effective feature transfer from convolutional networks. Syst. Biol. 68(6), 876–895 (2019)
Vinces, R.F.: Phenomics: genotype to phenotype. A Report of the USDA/NSF Phenomics Workshop (2011). https://www.nsf.gov/bio/pubs/reports/phenomics_workshop_report.pdf. Accessed March 2020
Winfree, E., Liu, F., Wenzler, L.A., Seeman, N.C.: Design and self-assembly of two-dimensional DNA crystals. Nature 394, 539–544 (1998)
Weigel, D., Mott, R.: The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10(5), 107 (2009)
Weimann, A., Mooren, K., Frank, J., Pope, P.B., Bremges, A., McHardy, A.C.: From genomes to phenotypes: traitar, the microbial trait analyzer. mSYstems 1(6) 101–116 (2016). https://doi.org/10.1128/mSystems.00101-16
Woese, C., Fox, G.: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. U.S.A. 74, 5088–5090 (1977)
Acknowledgements
We would like to thank the labs of professors Nubia Matta and Fernando Garcia at the National University and Duy Pham at the University of Memphis for their work in collecting some of the sample data for blackfly used in this paper. Many thanks also go to the High Performance Computing Center (HPC) at the U of Memphis for the time to compute DNA space centroids, pmeric feature vectors and Voronoi diagrams.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mainali, S., Garzon, M.H., Colorado, F.A. (2020). New Genomic Information Systems (GenISs): Species Delimitation and IDentification. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2020. Lecture Notes in Computer Science(), vol 12108. Springer, Cham. https://doi.org/10.1007/978-3-030-45385-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-45385-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45384-8
Online ISBN: 978-3-030-45385-5
eBook Packages: Computer ScienceComputer Science (R0)