skip to main content
10.1145/3018896.3056793acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccConference Proceedingsconference-collections
research-article

On parallelizing graph theoretical approaches for identifying causal genes and pathways from very large biological networks

Published:22 March 2017Publication History

ABSTRACT

The time taken to compute meaningful information from biological networks is very high for most of the applications and therefore these networks are very hard to process. This work focuses on improving the speed of processing biological networks, in particular, faster traversal of genomes which have been mapped into a network for the detection of causal genes and associated pathways. Inference of disease causing genes and their pathways has achieved a crucial role in computational biology because of its practicality in understanding the major causal genes and their interactions that lead to a disease state, and suggesting new drug targets. In this work, Hadoop's distributed storage system has been used to store the molecular interaction network. Graph parallel processing techniques of Hadoop MapReduce, in conjunction with graph theoretical approaches have been utilized to improve the accuracy of results and execution time on benchmark data.

References

  1. Przytycka TM Cho D-Y, Kim Y-A. 2012. Chapter 5: Network Biology Approach to Complex Diseases. PLoS Computational Biology 8(12) (2012).Google ScholarGoogle Scholar
  2. Jonathan Cohen. 2009. Graph Twiddling in a MapReduce World. Computing in Science Engineering 11, 4 (2009), 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jeethu V. Devasia and Priya Chandran. 2014. Towards an Improved Algorithm for Modeling Information Flow in Biological Networks. In International Conference on Advances in Computing, Communications, and Information Science. Elsevier, 88--95.Google ScholarGoogle Scholar
  4. Jeethu V. Devasia and Priya Chandran. 2016. Inferring disease causing genes and their pathways: A mathematical perspective. Cornell University Library (2016). Temporary submission ID: 1700287.Google ScholarGoogle Scholar
  5. J. V. Devasia and P. Chandran. 2016. Who are the key players behind a disease state?: Outcomes of a new computational approach on cancer data. In 2016 International Conference on Bioinformatics and Systems Biology (BSB). 1--4.Google ScholarGoogle Scholar
  6. Alison Devonshire, Ramnath Elaswarapu, and Carole Foy. 2010. Evaluation of external RNA controls for the standardisation of gene expression biomarker measurements. BMC Genomics 11, 1 (2010), 662.Google ScholarGoogle ScholarCross RefCross Ref
  7. Peter G Doyle and J Laurie Snell. 2000. Random Walks and Electric Networks. Mathematical Association of America, Washington DC.Google ScholarGoogle Scholar
  8. Alexessander Couto Alves Fredrik Barrenäs, Sreenivas Chavali and others. 2012. Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms. Genome Biology (2012).Google ScholarGoogle Scholar
  9. Mohammed Guller. 2015. Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis. Apress. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joshua W.K. Ho, Maurizio Stefani, Cristobal G. dos Remedios, and Michael A. Charleston. 2008. Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24, 13 (2008), i390--i398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Miko I. 2008. Epistasis: Gene interaction and phenotype effects. Nature Education 1(1):197 (2008).Google ScholarGoogle Scholar
  12. Ernest S. Kawasaki. 2010. The End of the Microarray Tower of Babel: Will Universal Standards Lead the Way? Journal of Biomolecular Techniques 17:200206 (2010).Google ScholarGoogle Scholar
  13. Ryan Kelley and Trey Ideker. 2005. Systematic interpretation of genetic interactions using protein networks. Nat Biotechnology 23(5) (2005).Google ScholarGoogle Scholar
  14. Przytycka TM Kim Y-A, Wuchty S. 2011. Identifying causal genes and dysregulated pathways in complex diseases. PLOS Computational Biology 7 (2011).Google ScholarGoogle Scholar
  15. Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova. 2005. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 33, suppl 1 (2005), D54--D58.Google ScholarGoogle Scholar
  16. Frank Millenaar, John Okyere, Sean May, Martijn van Zanten, Laurentius Voesenek, and Anton Peeters. 2006. How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results. BMC Bioinformatics 7, 1 (2006), 137.Google ScholarGoogle ScholarCross RefCross Ref
  17. Omar Odibat and Chandan K. Reddy. 2012. Ranking differential hubs in gene co-expression networks. Journal of Bioinformatics and Computational Biology 10 (2012), 1240002 (15 pages).Google ScholarGoogle ScholarCross RefCross Ref
  18. Georgios Pavlopoulos, Maria Secrier, Charalampos Moschopoulos, Theodoros Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, and Pantelis Bagos. 2011. Using graph theory to analyze biological networks. BioData Mining 4, 1 (2011), 10.Google ScholarGoogle ScholarCross RefCross Ref
  19. H. Singhal and R. M. R. Guddeti. 2014. Modified MapReduce framework for enhancing performance of graph based algorithms by fast convergence in distributed environment. In Advances in Computing, Communications and Informatics (ICACCI), 2014 International Conference on. 1240--1245.Google ScholarGoogle Scholar
  20. Silpa Suthram, Richard M Karp Andreas Beyer, and Trey Ideker Yonina Eldar. 2008. eQED: an efficient method for interpreting eQTL associations using protein networks. Molecular Systems Biology 4:162 (2008).Google ScholarGoogle Scholar
  21. Damian Szklarczyk, Andrea Franceschini, Michael Kuhn, Milan Simonovic, Alexander Roth, Pablo Minguez, Tobias Doerks, Manuel Stark, Jean Muller, Peer Bork, Lars J. Jensen, and Christian von Mering. 2010. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research (2010).Google ScholarGoogle Scholar
  22. Ronald C. Taylor. 2010. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. In Proceedings of the 11th Annual Bioinformatics Open Source Conference (BOSC) 2010. BMC Bioinformatics.Google ScholarGoogle ScholarCross RefCross Ref
  23. Zhidong Tu, Li Wang, Michelle N. Arbeitman, Ting Chen, and Fengzhu Sun. 2006. An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 22, 14 (2006), e489--e496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tom White. 2012. Hadoop: The Definitive Guide. O'Reilly Media / Yahoo Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Wu, Y. Dong, Q. Ke, and Y. Cai. 2011. A parallel computing model for large-graph mining with MapReduce. In Natural Computation (ICNC), 2011 Seventh International Conference on, Vol. 1. 43--47.Google ScholarGoogle Scholar
  26. Chao Wu, Jun Zhu, and Xuegong Zhang. 2013. Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma. BMC Bioinformatics 14, 1 (2013), 365.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Yin and J. Wang. 2015. Optimize Parallel Data Access in Big Data Processing. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. 721--724.Google ScholarGoogle Scholar
  28. Stefan Wuchty Yoo-Ah Kim, Jozef H Przytycki and Teresa M Przytycka. 2011. Modeling information flow in biological networks. In Physical Biology, Vol. 8. 2011 IOP Publishing Ltd, 1--9.Google ScholarGoogle Scholar
  29. Quan Zou, Xu-Bin Li, Wen-Rui Jiang, Zi-Yu Lin, Gui-Lin Li, and Ke Chen. 2013. Survey of MapReduce frame operation in bioinformatics. Briefings in Bioinformatics (2013).Google ScholarGoogle Scholar
  30. Gabriel stlund, Mats Lindskog, and Erik L. L. Sonnhammer. 2010. Network-based Identification of Novel Cancer Genes. Molecular & Cellular Proteomics 9, 4 (2010), 648--655.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On parallelizing graph theoretical approaches for identifying causal genes and pathways from very large biological networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICC '17: Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing
          March 2017
          1349 pages
          ISBN:9781450347747
          DOI:10.1145/3018896

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 March 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICC '17 Paper Acceptance Rate213of590submissions,36%Overall Acceptance Rate213of590submissions,36%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader