Abstract
Massive amounts of graph data have been generated in many areas, including computational biology and social networks. Often these graphs have attributes associated with nodes. One of the most intriguing questions in graphs representing complex data is to find communities or clusters. The use of attribute data in finding clusters is shown to be effective in many application areas, e.g., finding subnetwork biomarkers for cancer prediction and targeted advertising for a group of friends in social network. In this paper, we propose an algorithm for mining maximal dense cohesive clusters from node-attributed graphs. Typically the number of reported maximal dense cohesive clusters can be very large for relaxed constraints; therefore, we propose a post-processing algorithm for extracting a representative subset of these clusters. Experiments on real-world datasets show that the proposed approach is effective in mining meaningful biological clusters from protein–protein interaction network with attributes extracted from gene expression datasets. Furthermore, the proposed approach outperforms competitive algorithms in terms of the running time of the algorithm.
Similar content being viewed by others
References
Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6):734–749
Aggarwal CC, Wang H (2010) Managing and mining graph data, vol 40. Springer Berlin
Asur S, Huberman BA (2010) Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on eeb intelligence and intelligent agent technology (WI-IAT), vol 1. IEEE, pp 492–499
Avis D, Fukuda K (1996) Reverse search for enumeration. Discret Appl Math 65(1):21–46
Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M (2007) Still stratus not altocumulus: further evidence against the date/party hub distinction. PLoS Biol 5(6):e154
Chatr-aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, ODonnell L, et al (2013) The biogrid interaction database: 2013 update. Nucl Acids Res 41(D1):D816–D823
Chowdhury SA, Nibbe RK, Chance MR, Koyutürk M (2011) Subnetwork state functions define dysregulated subnetworks in cancer. J Comput Biol 18(3):263–281
Chuang HY, Lee E, Liu YT, Lee D, Ideker T (2007) Network-based classification of breast cancer metastasis. Mol Syst Biol 3(1):140
Colak R, Moser F, Chu JSC, Schönhuth A, Chen N, Ester M (2010) Module discovery by exhaustive search for densely connected, co-expressed regions in biomolecular interaction networks. PloS One 5(10):e13348
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York, NY
Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 11(12):4241–4257
Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147
Georgii E, Dietmann S, Uno T, Pagel P, Tsuda K (2009) Enumeration of condition-dependent dense modules in protein interaction networks. Bioinformatics 25(7):933–940
Gunnemann S, Farber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 845–850
Huang DW, Sherman BT, Lempicki RA (2009a) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13
Huang DW, Sherman BT, Lempicki RA (2009b) Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat Protoc 4(1):44–57
Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y (2000) Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci 97(3):1143–1147
Jin R, Mccallen S, Liu C, Xiang Y, Almaas E, Zhou X (2009) Identify dynamic network modules with temporal and spatial constraints. In: Pacific symposium on biocomputing
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol:415–444 (2001)
Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numericalvectors with a modular network. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 647–656
Tong AHY, Drees B, Nardelli G, Bader GD, Brannetti B, Castagnoli L, Evangelista M, Ferracuti S, Nelson B, Paoluzi S et al (2002) A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295(5553):321–324
Uno T (2010) An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica 56(1):3–16
Vazirani VV (2001) Approximation algorithms. Springer, Berlin
Acknowledgment
This study was supported in part by the National Science Foundation (NSF) awards IIS-1423321.
Author information
Authors and Affiliations
Corresponding author
Additional information
A. Goparaju and T. Brazier have contributed equally for this research.
Rights and permissions
About this article
Cite this article
Goparaju, A., Brazier, T. & Salem, S. Mining representative maximal dense cohesive subnetworks. Netw Model Anal Health Inform Bioinforma 4, 29 (2015). https://doi.org/10.1007/s13721-015-0101-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-015-0101-6