Skip to main content

New Genomic Information Systems (GenISs): Species Delimitation and IDentification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12108))

Abstract

Genomic Information Systems (GenISs) have been recently proposed to provide a universal framework for feature extraction, dimensionality reduction and more effective processing of genomic data. They are based on methodologies more anchored in biochemical reality and exploit newly discovered structure of DNA spaces to extract and represent genomic data in compact data structures rich enough to answer critical questions about the original organisms, including phylogenies, species identification and, more recently, phenotypic information. They work from just DNA sequence alone (possibly including full genomes), in a matter of minutes or hours, and produce answers consistent with well-established and accepted biological knowledge. Here, we introduce a second family of GenISs based on further structural properties of DNA spaces and demonstrate that they could also be used to provide principled, general and intuitive solutions to fundamental questions in biology such as “What exactly is a biological species?” Current answers to these all important questions have remained dependent on specific taxa and subject to analyst choices. We further discuss other applications to be explored in the future, including universal biological taxonomies in the quest for a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Adleman, L.: Molecular computation of solutions of combinatorial problems. Science 266, 1021–1024 (1994)

    Article  CAS  Google Scholar 

  • Arthur, D., Vassilvitskii, S: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)

    Google Scholar 

  • Colorado-Garzón, F.A., Adler, P.H., García, L.F., Muñoz de Hoyos, P., Bueno, M.L., Matta, N.E.: Estimating diversity of black flies in the simulium ignescens and simulium tunja complexes in Colombia: chromosomal rearrangements as the core of integrative taxonomy. J. Heredity 108(1), 12–24 (2017)

    Article  Google Scholar 

  • de Queiroz, K.: Species concepts and species delimitation. Syst. Biol. 56(6), 879–886 (2007)

    Article  Google Scholar 

  • de Queiroz, K.: Ernst Mayr and the modern concept of species. Proc. Nat. Acad. Sci. 102(suppl 1), 6600–6607 (2005)

    Article  Google Scholar 

  • Deaton, J., Chen, J., Garzon, M., Wood, D.H.: Test Tube Selection of Large Independent Sets of DNA Oligonucleotides, pp. 152–166. World Publishing Co., Singapore (2004). (Volume dedicated to Ned Seeman on occasion of his 60th birthday)

    Google Scholar 

  • Garzon, M.H., Mainali, S.: Towards reliable microarray analysis and design. In: The 9th BiCOB-International Conference on Bioinformatics and Computational Biology. International Society for Computational and their Applications ISCA (2017a). 6pp.

    Google Scholar 

  • Garzon, M.H., Mainali, S.: Towards a universal genomic positioning system: phylogenetics and species IDentification. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10209, pp. 469–479. Springer, Cham (2017b). https://doi.org/10.1007/978-3-319-56154-7_42

  • Garzon, M.: DNA codeword design: theory and applications. Parallel Process. Lett. 24(2), 1–21 (2014)

    Article  Google Scholar 

  • Garzon, M.H., Bobba, K.C.: A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic, D., Turberfield, A. (eds.) DNA 2012. LNCS, vol. 7433, pp. 73–85. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32208-2_6

    Chapter  Google Scholar 

  • Garzon, M., Pham, D.: Genomic solutions to hospital-acquired bacterial infection identification. In: Rojas I., Ortuño F. (eds) Bioinformatics and Biomedical Engineering. Proc. IWBBIO 2018. Lecture Notes in Bioinformatics, Part I, vol. 10813, pp. 486–497. Springer-Verlag (2018). https://doi.org/10.1007/978-3-319-78723-7_42

  • Hartigan, J.A., Wong, M.A.: Algorithm AS 136 A k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Applied Statistics) 28(1), 100–108 (1979)

    Google Scholar 

  • Hebert, P.D., Cywinska, A., Ball, S.L.: Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B Biol. Sci. 270, 313–321 (2003)

    Article  CAS  Google Scholar 

  • Henning, W.: Phylogenetic Systematics. translated by Davis, D.D., Zangerl, R. University of Illinois Press, Urbana (1966)

    Google Scholar 

  • Karr, J.R. et al.: A whole-cell computational model predicts phenotype from genotype. Cell 150(2), 389–401 (2012)

    Google Scholar 

  • von Linnaeus, C.: Systema Naturae, edition X, vol. 1 (Systema naturae per regna tria naturae, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Tomus I. Editio decima, reformata). Salvii Holmiae 1 (1758)

    Google Scholar 

  • Mainali, S., Colorado, F.A., Garzon, M.H.: Foretelling the phenotype of a genomic sequence. IEEE Trans. Comput. Biol. Bioinform. (2020, under review)

    Google Scholar 

  • Mayr, E.: Systematics and the Origin of Species. Columbia University Press, New York (1942)

    Google Scholar 

  • Seeman, N.: DNA in a material world. Nature 421, 427–431 (2003)

    Article  Google Scholar 

  • Sokal, R.R., Crovello, T.J.: The biological species concept: a critical evaluation. Am. Nat. 104, 127–153 (1970)

    Article  Google Scholar 

  • Van Valen, L.: Ecological species, multispecies, and oaks. Taxon 25, 233–239 (1976)

    Article  Google Scholar 

  • Valan, M., Makonyi, K., Maki, A., Vondráček, D., Ronquist, F.: Automated taxonomic identification of insects with expert-level accuracy using effective feature transfer from convolutional networks. Syst. Biol. 68(6), 876–895 (2019)

    Article  Google Scholar 

  • Vinces, R.F.: Phenomics: genotype to phenotype. A Report of the USDA/NSF Phenomics Workshop (2011). https://www.nsf.gov/bio/pubs/reports/phenomics_workshop_report.pdf. Accessed March 2020

  • Winfree, E., Liu, F., Wenzler, L.A., Seeman, N.C.: Design and self-assembly of two-dimensional DNA crystals. Nature 394, 539–544 (1998)

    Article  CAS  Google Scholar 

  • Weigel, D., Mott, R.: The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10(5), 107 (2009)

    Article  Google Scholar 

  • Weimann, A., Mooren, K., Frank, J., Pope, P.B., Bremges, A., McHardy, A.C.: From genomes to phenotypes: traitar, the microbial trait analyzer. mSYstems 1(6) 101–116 (2016). https://doi.org/10.1128/mSystems.00101-16

  • Woese, C., Fox, G.: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. U.S.A. 74, 5088–5090 (1977)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank the labs of professors Nubia Matta and Fernando Garcia at the National University and Duy Pham at the University of Memphis for their work in collecting some of the sample data for blackfly used in this paper. Many thanks also go to the High Performance Computing Center (HPC) at the U of Memphis for the time to compute DNA space centroids, pmeric feature vectors and Voronoi diagrams.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sambriddhi Mainali or Max H. Garzon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mainali, S., Garzon, M.H., Colorado, F.A. (2020). New Genomic Information Systems (GenISs): Species Delimitation and IDentification. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2020. Lecture Notes in Computer Science(), vol 12108. Springer, Cham. https://doi.org/10.1007/978-3-030-45385-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45385-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45384-8

  • Online ISBN: 978-3-030-45385-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics