skip to main content
10.1145/2808719.2812595acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data

Published: 09 September 2015 Publication History

Abstract

Functional genomics, the effort to understand the role of genomic elements in biological processes, has led to an avalanche of diverse experimental and semantic information defining associations between genes and various biological concepts across species and experimental paradigms. Integrating this rapidly expanding wealth of heterogeneous data, and finding consensus among so many diverse sources for specific research questions, require highly sophisticated big data structures and algorithms for harmonization and scalable analysis. In this context, multipartite graphs can often serve as useful structures for representing questions about the role of genes in multiple, frequently-occurring disease processes. The main focus of this paper is on finding and analyzing efficient algorithms for dense subgraph enumeration in such graphs. An O(3n/3)-time procedure was devised to enumerate all maximal k-partite cliques in a k-partite graph, where k ≥ 3. The maximum number of such cliques is also shown to obey this bound, and thus this procedure obtains the best possible asymptotic performance. Empirical testing on both real and synthetic data is conducted. Concrete applications to biological data are described, as are scalability issues in the context of big data analysis.

References

[1]
Abu-Khzam, F. N., Baldwin, N. E., Langston, M. A. and Samatova, N. F., On the Relative Efficiency of Maximal Clique Enumeration Algorithms, with Application to High-Throughput Computational Biology. in Proceedings, International Conference on Research Trends in Science and Technology, (Beirut, Lebanon, 2005).
[2]
Aigner, M. Turán's Graph Theorem. The American Mathematical Monthly, 102 (9). 808--816.
[3]
Baker, E. J., Jay, J. J., Bubier, J. A., Langston, M. A. and Chesler, E. J. GeneWeaver: a web-based system for integrative functional genomics. Nucleic Acids Res, 40 (Database issue). D1067--1076.
[4]
Bomze, I., Budinich, M., Pardalos, P. and Pelillo, M. The Maximum Clique Problem. in Du, D.-Z. and Pardalos, P. M. eds. Handbook of Combinatorial Optimization, Kluwer Academic Publishers, 1999.
[5]
Bron, C. and Kerbosch, J. Algorithm 457: finding all cliques of an undirected graph. Proceedings of the ACM, 16(9). 575--577.
[6]
Castro, V. M., Minnier, J., Murphy, S. N., Kohane, I., Churchill, S. E., Gainer, V., Cai, T., Hoffnagle, A. G., Dai, Y., Block, S., Weill, S. R., Nadal-Vicens, M., Pollastri, A. R., Rosenquist, J. N., Goryachev, S., Ongur, D., Sklar, P., Perlis, R. H. and Smoller, J. W. Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls. American Journal of Psychiatry, 172 (4).
[7]
Clinton, S. M., Stead, J. D. H., Miller, S., Watson, S. J. and Akil, H. Developmental underpinnings of differences in rodent novelty-seeking an emotional reactivity. The European Journal of Neuroscience, 34 (6). 994--1005.
[8]
Cui, C., Shurtleff, D. and Harris, R. A. Neuroimmune Mechanisms of Alcohol and Drug Addiction. International Review of Neurobiology, 118. 1--12.
[9]
Davis, A. P., Grondin, C. J., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., King, B. L., Wiegers, T. C. and Mattingly, C. J. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res, 43 (Database issue). D914--920.
[10]
Dean, J. and Ghemawat, S. MapReduce: simplified data processing on large clusters. Commun. ACM, 51 (1). 107--113.
[11]
Eppstein, D., Löffler, M. and Strash, D. Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. in Cheong, O., Chwa, K.-Y. and Park, K. eds. Algorithms and Computation, Springer Berlin Heidelberg, 2010, 403--414.
[12]
Gaspers, S., Kratsch, D. and Liedloff, M. On Independent Sets and Bicliques in Graphs. Algorithmica, 62 (3-4). 637--658.
[13]
Hagan, R. D., Phillips, C. A., Wang, K., Rogers, G. L. and Langston, M. A., Toward an efficient, highly scalable maximum clique solver for massive graphs. in IEEE International Conference on Big Data, (2014), 41--45.
[14]
Jay, J., Eblen, J., Zhang, Y., Benson, M., Perkins, A., Saxton, A., Voy, B., Chesler, E. and Langston, M. A systematic comparison of genome-scale clustering algorithms. BMC Bioinformatics, 13 (Suppl 10). S7.
[15]
Jay, J. J. Cross Species Integration of Functional Genomics Experiments. International Review of Neurobiology, 104. 1--24.
[16]
Jones, K. A. and Thomsen, C. The Role of the Innate Immune System in Psychiatric Disorders. Molecular and Cellular Neuroscience, 53. 52--62.
[17]
Karp, R. Reducibility among combinatorial problems. in Miller, R. and Thatcher, J. eds. Complexity of Computer Computations, Plenum Press, 1972, 85--103.
[18]
Kose, F., Weckwerth, W., Linke, T. and Fiehn, O. Visualizing plant metabolomic correlation networks using clique--metabolite matrices. Bioinformatics, 17. 1198--1208.
[19]
Li, J., Li, H., Soh, D. and Wong, L. A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns. in Jorge, A., Torgo, L., Brazdil, P., Camacho, R. and Gama, J. eds. Knowledge Discovery in Databases: PKDD 2005, Springer Berlin Heidelberg, 2005, 146--156.
[20]
Liu, Q., Chen, Y.-P.P. and Li, J. k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. Journal of Theoretical Biology, 340 (0). 146--154.
[21]
Mayfield, J., Ferguson, L. and Harris, R. A. Neuroimmune Signaling: A Key Component of Alcohol Abuse. Current opinion in neurobiology, 23 (4). 513--520.
[22]
Miller, A. H., Haroon, E., Raison, C. L. and Felger, J. C. Cytokine Targets in the Brain: Impact on Neurotransmitters and Neurocircuits. Depression and anxiety, 30 (4). 297--306.
[23]
Miller, R. E. and Muller, D. E. A problem of maximum consistent subsets. IBM Research Report RC-240, Watson Research Center, Yorktown Heights, NY.
[24]
Moon, J. W. and Moser., L. On Cliques in Graphs. Israel J. Math, 3. 23--28.
[25]
Potash, J. B. Electronic Medical Records: Fast Track to Big Data in Bipolar Disorder. The American Journal of Psychiatry.
[26]
Rogers, G. L., Perkins, A. D., Phillips, C. A., Eblen, J. D., Abu-Khzam, F. N. and Langston, M. A., Using out-of-core techniques to produce exact solutions to the maximum clique problem on extremely large graphs. in Proceedings, ACS/IEEE International Conference on Computer Systems and Applications, (Rabat, Morocco, 2009), 374--381.
[27]
Setubal, J. C. and Meidanis, J. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1997.
[28]
Tomita, E., Tanaka, A. and Takahashi, H. The Worst-Case Time Complexity for Generating all Maximal Cliques and Computational Experiments. Theoretical Computer Science, 363. 28--42.
[29]
Torrente, M. P., Freeman, W. M. and Vrana, K. E. Protein biomarkers of alcohol abuse. Expert Review of Proteomics, 9 (4). 425--436.
[30]
Turán, P. On an Extremal Problem in Graph Theory. Matematikai és Fizikai Lapok (in Hungarian), 48. 436--452.
[31]
White, T. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2009.
[32]
Wood, D. On the Number of Maximal Independent Sets in a Graph. Discrete Mathematics & Theoretical Computer Science, 13. 17--20.
[33]
Zaki, M. J., Peters, M., Assent, I. and Seidl, T. Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data & Knowledge Engineering, 60 (1). 51--70.
[34]
Zhang, Y., Abu-Khzam, F. N., Baldwin, N. E., Chesler, E. J., Langston, M. A. and Samatova, N. F., Genome-scale computational approaches to memory-intensive applications in systems biology. in Proceedings, Supercomputing, (Seattle, Washington, 2005).
[35]
Zhang, Y., Abu-Khzam, F. N., Baldwin, N. E., Chesler, E. J., Langston, M. A. and Samatova, N. F., Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology. in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, (2005), 12--12.
[36]
Zhang, Y., Phillips, C. A., Rogers, G. L., Baker, E. J., Chesler, E. J. and Langston, M. A. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 15 (1). 110.

Cited By

View all
  • (2024)Interrelated Dense Pattern Detection in Multilayer NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339868336:11(6462-6476)Online publication date: Nov-2024
  • (2023)Ant Colony Optimization Algorithm for Finding the Maximum Number of d-Size Cliques in a Graph with Not All m Edges between Its d PartsDependable Computer Systems and Networks10.1007/978-3-031-37720-4_23(255-264)Online publication date: 11-Aug-2023
  • (2019)On Finding and Enumerating Maximal and Maximum k-Partite Cliques in k-Partite GraphsAlgorithms10.3390/a1201002312:1(23)Online publication date: 15-Jan-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
September 2015
683 pages
ISBN:9781450338530
DOI:10.1145/2808719
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data analytics
  2. dense subgraph enumeration
  3. life science applications
  4. multipartite graphs

Qualifiers

  • Research-article

Funding Sources

  • National Institutes of Health

Conference

BCB '15
Sponsor:

Acceptance Rates

BCB '15 Paper Acceptance Rate 48 of 141 submissions, 34%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Interrelated Dense Pattern Detection in Multilayer NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339868336:11(6462-6476)Online publication date: Nov-2024
  • (2023)Ant Colony Optimization Algorithm for Finding the Maximum Number of d-Size Cliques in a Graph with Not All m Edges between Its d PartsDependable Computer Systems and Networks10.1007/978-3-031-37720-4_23(255-264)Online publication date: 11-Aug-2023
  • (2019)On Finding and Enumerating Maximal and Maximum k-Partite Cliques in k-Partite GraphsAlgorithms10.3390/a1201002312:1(23)Online publication date: 15-Jan-2019
  • (2018)Bipartite graphs in systems biology and medicine: a survey of methods and applicationsGigaScience10.1093/gigascience/giy0147:4Online publication date: 19-Feb-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media