ABSTRACT
The time taken to compute meaningful information from biological networks is very high for most of the applications and therefore these networks are very hard to process. This work focuses on improving the speed of processing biological networks, in particular, faster traversal of genomes which have been mapped into a network for the detection of causal genes and associated pathways. Inference of disease causing genes and their pathways has achieved a crucial role in computational biology because of its practicality in understanding the major causal genes and their interactions that lead to a disease state, and suggesting new drug targets. In this work, Hadoop's distributed storage system has been used to store the molecular interaction network. Graph parallel processing techniques of Hadoop MapReduce, in conjunction with graph theoretical approaches have been utilized to improve the accuracy of results and execution time on benchmark data.
- Przytycka TM Cho D-Y, Kim Y-A. 2012. Chapter 5: Network Biology Approach to Complex Diseases. PLoS Computational Biology 8(12) (2012).Google Scholar
- Jonathan Cohen. 2009. Graph Twiddling in a MapReduce World. Computing in Science Engineering 11, 4 (2009), 29--41. Google ScholarDigital Library
- Jeethu V. Devasia and Priya Chandran. 2014. Towards an Improved Algorithm for Modeling Information Flow in Biological Networks. In International Conference on Advances in Computing, Communications, and Information Science. Elsevier, 88--95.Google Scholar
- Jeethu V. Devasia and Priya Chandran. 2016. Inferring disease causing genes and their pathways: A mathematical perspective. Cornell University Library (2016). Temporary submission ID: 1700287.Google Scholar
- J. V. Devasia and P. Chandran. 2016. Who are the key players behind a disease state?: Outcomes of a new computational approach on cancer data. In 2016 International Conference on Bioinformatics and Systems Biology (BSB). 1--4.Google Scholar
- Alison Devonshire, Ramnath Elaswarapu, and Carole Foy. 2010. Evaluation of external RNA controls for the standardisation of gene expression biomarker measurements. BMC Genomics 11, 1 (2010), 662.Google ScholarCross Ref
- Peter G Doyle and J Laurie Snell. 2000. Random Walks and Electric Networks. Mathematical Association of America, Washington DC.Google Scholar
- Alexessander Couto Alves Fredrik Barrenäs, Sreenivas Chavali and others. 2012. Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms. Genome Biology (2012).Google Scholar
- Mohammed Guller. 2015. Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis. Apress. Google ScholarDigital Library
- Joshua W.K. Ho, Maurizio Stefani, Cristobal G. dos Remedios, and Michael A. Charleston. 2008. Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24, 13 (2008), i390--i398. Google ScholarDigital Library
- Miko I. 2008. Epistasis: Gene interaction and phenotype effects. Nature Education 1(1):197 (2008).Google Scholar
- Ernest S. Kawasaki. 2010. The End of the Microarray Tower of Babel: Will Universal Standards Lead the Way? Journal of Biomolecular Techniques 17:200206 (2010).Google Scholar
- Ryan Kelley and Trey Ideker. 2005. Systematic interpretation of genetic interactions using protein networks. Nat Biotechnology 23(5) (2005).Google Scholar
- Przytycka TM Kim Y-A, Wuchty S. 2011. Identifying causal genes and dysregulated pathways in complex diseases. PLOS Computational Biology 7 (2011).Google Scholar
- Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova. 2005. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 33, suppl 1 (2005), D54--D58.Google Scholar
- Frank Millenaar, John Okyere, Sean May, Martijn van Zanten, Laurentius Voesenek, and Anton Peeters. 2006. How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results. BMC Bioinformatics 7, 1 (2006), 137.Google ScholarCross Ref
- Omar Odibat and Chandan K. Reddy. 2012. Ranking differential hubs in gene co-expression networks. Journal of Bioinformatics and Computational Biology 10 (2012), 1240002 (15 pages).Google ScholarCross Ref
- Georgios Pavlopoulos, Maria Secrier, Charalampos Moschopoulos, Theodoros Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, and Pantelis Bagos. 2011. Using graph theory to analyze biological networks. BioData Mining 4, 1 (2011), 10.Google ScholarCross Ref
- H. Singhal and R. M. R. Guddeti. 2014. Modified MapReduce framework for enhancing performance of graph based algorithms by fast convergence in distributed environment. In Advances in Computing, Communications and Informatics (ICACCI), 2014 International Conference on. 1240--1245.Google Scholar
- Silpa Suthram, Richard M Karp Andreas Beyer, and Trey Ideker Yonina Eldar. 2008. eQED: an efficient method for interpreting eQTL associations using protein networks. Molecular Systems Biology 4:162 (2008).Google Scholar
- Damian Szklarczyk, Andrea Franceschini, Michael Kuhn, Milan Simonovic, Alexander Roth, Pablo Minguez, Tobias Doerks, Manuel Stark, Jean Muller, Peer Bork, Lars J. Jensen, and Christian von Mering. 2010. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research (2010).Google Scholar
- Ronald C. Taylor. 2010. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. In Proceedings of the 11th Annual Bioinformatics Open Source Conference (BOSC) 2010. BMC Bioinformatics.Google ScholarCross Ref
- Zhidong Tu, Li Wang, Michelle N. Arbeitman, Ting Chen, and Fengzhu Sun. 2006. An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 22, 14 (2006), e489--e496. Google ScholarDigital Library
- Tom White. 2012. Hadoop: The Definitive Guide. O'Reilly Media / Yahoo Press. Google ScholarDigital Library
- B. Wu, Y. Dong, Q. Ke, and Y. Cai. 2011. A parallel computing model for large-graph mining with MapReduce. In Natural Computation (ICNC), 2011 Seventh International Conference on, Vol. 1. 43--47.Google Scholar
- Chao Wu, Jun Zhu, and Xuegong Zhang. 2013. Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma. BMC Bioinformatics 14, 1 (2013), 365.Google ScholarCross Ref
- J. Yin and J. Wang. 2015. Optimize Parallel Data Access in Big Data Processing. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. 721--724.Google Scholar
- Stefan Wuchty Yoo-Ah Kim, Jozef H Przytycki and Teresa M Przytycka. 2011. Modeling information flow in biological networks. In Physical Biology, Vol. 8. 2011 IOP Publishing Ltd, 1--9.Google Scholar
- Quan Zou, Xu-Bin Li, Wen-Rui Jiang, Zi-Yu Lin, Gui-Lin Li, and Ke Chen. 2013. Survey of MapReduce frame operation in bioinformatics. Briefings in Bioinformatics (2013).Google Scholar
- Gabriel stlund, Mats Lindskog, and Erik L. L. Sonnhammer. 2010. Network-based Identification of Novel Cancer Genes. Molecular & Cellular Proteomics 9, 4 (2010), 648--655.Google ScholarCross Ref
Index Terms
- On parallelizing graph theoretical approaches for identifying causal genes and pathways from very large biological networks
Recommendations
Graph sparsification with parallelization to optimize the identification of causal genes and dysregulated pathways
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied ComputingDisease causing genes and their pathways can be identified by mapping genetic interaction sequences into a biological network and performing graph traversals based on certain approach specific parameters. Considering the large size of biological ...
Identifying regulatory relationships among genomic loci, biological pathways, and disease
Objective: Elucidating genetic factors of complex diseases is one of the most important challenges in biomedical research. Recently, a genetical genomics approach of mapping genotype to transcripts has been used in complex disease analysis. This ...
Graph pruning based approach for inferring disease causing genes and associated pathways
The problem of inferring disease causing genes and dysregulated pathways has obtained a vital position in computational biology research. But, the huge size of the biological network makes this process computationally challenging. Here, we tackle the ...
Comments