ABSTRACT
Enabling discoveries and foundational understanding in modern day life sciences have largely become centered on our ability to effectively analyze large swathes of complex data from a diverse range of sources, capturing complex information encapsulated across the different layers of the nature-built system. While this data-centric approach has been the primary driver in computational life sciences and discovery pipelines for several decades now, the field has decisively diverged in the last few years on how and why these data are collected. More specifically, in contrast to yesteryear genomic and other -omic projects, modern day data collection by and large happens in an analysis-agnostic fashion---i.e., complex data are collected without any specific hypotheses to drive them; instead data are being collected because of easy availability of affordable high-throughput technologies. This has led to a fundamental shift in how we process these data and what we could glean from these data.
In this work, we present a novel algorithmic and software framework called Hyppo-X, which is based on algebraic topology to discover hidden structure within complex biological data sets [1, 3]. Topology is the field of computational mathematics that deals with structure at large. Computational topology and its applications constitute an emerging area of research with ample scope for development and data-driven discovery. We present results of our extensive collaborative studies in developing and applying our methods to analyze two types of data---plant phenomics data obtained from agricultural fields [2], and patient trajectories obtained from a network of hospitals toward antimicrobial stewardship [4]. Topological data analysis holds tremendous promise to model and analyze high-dimensional data sets in numerous scientific domains, and are likely to become part of future machine learning pipelines. These early studies demonstrate its potential while also highlighting a number of challenges and opportunities for future research.
The software is available for download at https://mhmethun.com/HYPPO-X/.
- Ananth Kalyanaraman, Methun Kamruzzaman, and Bala Krishnamoorthy. Interesting paths in the mapper complex. Journal of Computational Geometry, 10(1):500--531, 2019.Google Scholar
- Methun Kamruzzaman, Ananth Kalyanaraman, and Bala Krishnamoorthy. Detecting divergent subpopulations in phenomics data using interesting flares. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 155--164, 2018.Google ScholarDigital Library
- Methun Kamruzzaman, Ananth Kalyanaraman, Bala Krishnamoorthy, Stefan Hey, and Pat Schnable. Hyppo-X: A scalable exploratory framework for analyzing complex phenomics data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019.Google Scholar
- Kaniz Fatema Madhobi, Methun Kamruzzaman, Ananth Kalyanaraman, Eric Lofgren, Rebekah Moehring, and Bala Krishnamoorthy. A visual analytics framework for analysis of patient trajectories. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 15--24, 2019.Google ScholarDigital Library
Index Terms
- Scalable topological data analysis for life science applications
Recommendations
Characterizing the Role of Environment on Phenotypic Traits using Topological Data Analysis
BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsPhenomics is an emerging area within modern biology, which uses high throughput phenotyping tools to capture multiple environment and phenotypic trait measurements, at a massive scale. Due to the relatively nascency of the field, current tools and ...
A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
Algorithms for Computational BiologyAbstractThe goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning ...
Big Data Analysis with Interactive Visualization using R packages
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and ComputingCompared to the traditional data storing, processing, analyzing and visualization which have been performed, Big data requires evolutionary technologies of massive data processing on distributed and parallel systems, such as Hadoop system. Big data ...
Comments