Skip to main content
Log in

Identifying dense subgraphs in protein–protein interaction network for gene selection from microarray data

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

Selection of important genes responsible for a disease is an important task in bioinformatics. Microarray data are often used with differential expression being considered as a cue. Recently, such expression data are supplemented by gene ontology and genes/proteins interaction network for the selection task. The functional knowledge and interaction structure have become critical for understanding the biological processes, including selection of genes potentially associated to complex diseases. In this paper, we propose an approach that combines expression analysis with structural analysis of protein–protein interaction networks to identify genes associated with complex diseases. The dense subgraph structures embedded in the networks are extracted. We present results on three different types of benchmark cancer dataset (prostate cancer, interstitial lung disease and chronic lymphocytic leukemia) and show that several interesting biological information could be inferred, besides achieving a high prediction accuracy. The proposed methodology helps to identify not just differentially expressed genes but also hub genes important in biological processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.ncbi.nlm.nih.gov/geo/.

  2. http://www.ncbi.nlm.nih.gov/geo/.

  3. http://www.hprd.org/.

  4. http://www.ebi.ac.uk/intact/.

  5. http://www.mint.bio.uniroma2.it/mint/.

  6. http://www.ncbi.nlm.nih.gov/gene/.

  7. http://llama.med.harvard.edu/funcassociate.

  8. http://www.kegg.jp.

References

  • Ahn J, Yoon Y, Park C, Shin E, Park S (2011) Integrative gene network construction for predicting a set of complementary prostate cancer genes. Bioinformatics 27(13):1846–1853

    Article  Google Scholar 

  • Archer SY, Hodint RA (1999) Histone acetylation and cancer. Curr Opin Genet Dev 9(2):171–174

    Article  Google Scholar 

  • Barabási AL, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12(1):56–68

    Article  Google Scholar 

  • Berriz GF, Beaver JE, Cenik C, Tasan M, Roth FP (2009) Next generation software for functional trend analysis. Bioinformatics 25(22):3043–3044

    Article  Google Scholar 

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat MechTheor Exp 2008(10):P10,008

  • Braoudaki M, Lambrou GI, Vougas K, Karamolegou K, Tsangaris GT, Tzortzatou-Stathopoulou F (2013) Protein biomarkers distinguish between high-and low-risk pediatric acute lymphoblastic leukemia in a tissue specific manner. J Hematol Oncol 6(1):52

    Article  Google Scholar 

  • Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ, Chen RO, Brownstein BH, Cobb JP, Tschoeke SK et al (2005) A network-based analysis of systemic inflammation in humans. Nature 437(7061):1032–1037

    Article  Google Scholar 

  • Cho JH, Gelinas R, Wang K, Etheridge A, Piper MG, Batte K, Dakhlallah D, Price J, Bornman D, Zhang S et al (2011) Systems biology of interstitial lung diseases: integration of mrna and microrna expression changes. BMC Med Genom 4(1):8

    Article  Google Scholar 

  • Chuang HY, Lee E, Liu YT, Lee D, Ideker T (2007) Network-based classification of breast cancer metastasis. Mol Syst Biol 3(1)

  • Consortium TGO(2015) Gene ontology consortium: going forward. Nucl Acids Res 43D1:D1049–D1056

  • Cun Y, Fröhlich H (2012) Biomarker gene signature discovery integrating network knowledge. Biology 1(1):5–17

    Article  Google Scholar 

  • Cun Y, Fröhlich H (2013) Network and data integration for biomarker signature discovery via network smoothed t-statistics. PloS one 8(9):e73,074

  • Dao P, Wang K, Collins C, Ester M, Lapuk A, Sahinalp SC (2011) Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 27(13):i205–i213

    Article  Google Scholar 

  • Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):3

    Article  Google Scholar 

  • Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Müller T (2008) Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13):i223–i231

    Article  Google Scholar 

  • Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402(6757):86–90

    Article  Google Scholar 

  • Fält S, Merup M, Gahrton G, Lambert B, Wennborg A (2005) Identification of progression markers in b-cll by gene expression profiling. Experim Hematol 33(8):883–893

    Article  Google Scholar 

  • Feng J, Jiang R, Jiang T (2011) A max-flow-based approach to the identification of protein complexes using protein interaction and microarray data. Comput Biol Bioinf IEEE/ACM Trans 8(3):621–634

    Article  MathSciNet  Google Scholar 

  • Fortney K, Kotlyar M, Jurisica I (2010) Method inferring the functions of longevity genes with modular subnetwork biomarkers of caenorhabditis elegans aging. Genom Biol R13

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174

    Article  MathSciNet  Google Scholar 

  • Gaiteri C, Ding Y, French B, Tseng GC, Sibille E (2014) Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav 13(1):13–24

    Article  Google Scholar 

  • Ghosh A, Dhara BC, De RK (2014) Selection of genes mediating certain cancers, using a neuro-fuzzy approach. Neurocomputing 133:122–140

    Article  Google Scholar 

  • Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690

    Article  Google Scholar 

  • Goldberg AV (1984) Finding a maximum density subgraph. University of California Berkeley, CA

    Google Scholar 

  • Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ et al (2005) Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinform 6(1):58

    Article  Google Scholar 

  • Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144(5):646–674

    Article  Google Scholar 

  • He L, Wang Y, Yang Y, Huang L, Wen Z (2014) Identifying the gene signatures from gene-pathway bipartite network guarantees the robust model performance on predicting the cancer prognosis. BioMed Res Int 2014:424509

  • Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(suppl 1):i213–i221

    Article  Google Scholar 

  • Huang DW, Sherman BT, Lempicki RA (2008) Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat Protoc 4(1):44–57

  • Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucl Acids Res 37(1):1–13

    Article  Google Scholar 

  • Ideker T, Sharan R (2008) Protein networks in disease. Genome Res 18(4):644–652

    Article  Google Scholar 

  • Ji J, Zhang A, Liu C, Quan X, Liu Z (2014) Survey: Functional module detection from protein-protein interaction networks. Knowl Data Eng IEEE Trans 26(2):261–277

    Article  Google Scholar 

  • Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H (2012) The intact molecular interaction database in 2012. Nucl Acids Res 40(Database issue):D841–D846

  • Langfelder P, Horvath S (2008) Wgcna: an r package for weighted correlation network analysis. BMC Bioinform 9(1):559

    Article  Google Scholar 

  • Lee VE, Ruan N, Jin R, Aggarwal C (2010) A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, vol 40. Springer, Heidelberg, pp 303–336

  • Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni G (2012) MINT, the molecular interaction database: 2012 update. Nucl Acids Res 40(Database issue):D857–D861. doi:10.1093/nar/gkr930

  • Ma X, Gao L (2012) Predicting protein complexes in protein interaction networks using a core-attachment algorithm based on graph communicability. Inf Sci 189:233–254

    Article  Google Scholar 

  • Ma Y, Gal A, Koss MN (2007) The pathology of pulmonary sarcoidosis: update. In: Seminars in diagnostic pathology, vol 24. Elsevier, pp. 150–161

  • Masdehors P, Merle-Béral H, Magdelénat H, Delic J (2000) Ubiquitin-proteasome system and increased sensitivity of b-cll lymphocytes to apoptotic death activation. Leuk Lymph 38(5–6):499–504

    Article  Google Scholar 

  • Mertens D, Stilgenbauer S (2014) Prognostic and predictive factors in patients with chronic lymphocytic leukemia: Relevant in the era of novel treatment approaches? J Clin Oncol 32(9):869–872

    Article  Google Scholar 

  • Mesri EA, Feitelson MA, Munger K (2014) Human viral oncogenesis: A cancer hallmarks analysis. Cell Host Microb 15(3):266–282

    Article  Google Scholar 

  • Mitra K, Carvunis AR, Ramesh SK, Ideker T (2013) Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 14(10):719–732

    Article  Google Scholar 

  • Park C, Ahn J, Kim H, Park S (2014) Integrative gene network construction to analyze cancer recurrence using semi-supervised learning. PloS One 9(1):e86,309

  • Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A (2009) Human Protein Reference Database—2009 update. Nucl Acids Res 37(Database issue):D767–D772

  • Rahmani H, Blockeel H, Bender A (2011) Interaction-based feature selection for predicting cancer-related proteins in protein-protein interaction networks. In: Prooceedings of the Fifth International Workshop on Machine Learning in Systems Biology, pp 68–72

  • Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP (2007) Classification of microarray data using gene networks. BMC Bioinform 8(1):35

    Article  Google Scholar 

  • Ren X, Wang Y, Zhang XS, Jin Q (2013) ipcc: a novel feature extraction method for accurate disease class discovery and prediction. Nucl Acids Res gkt343

  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504

    Article  Google Scholar 

  • Soh D, Dong D, Guo Y, Wong L (2011) Finding consistent disease subnetworks across microarray datasets. BMC Bioinform 12(Suppl 13):S15

    Article  Google Scholar 

  • Sterner DE, Berger SL (2000) Acetylation of histones and transcription-related factors. Microbiol Mol Biol Rev 64(2):435–459

    Article  Google Scholar 

  • Stuart JM, Segal E, Koller D, Kim SK (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science (New York, N.Y.) 302(5643):249–255. doi:10.1126/science.1087447

    Article  Google Scholar 

  • Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102(43):15545–15550

    Article  Google Scholar 

  • Swarnkar T, Simões SN, Martins-Jr, DC, Anura A, Brentani H, Hashimoto RF, Mitra P (2014) Multiview clustering on ppi network for gene selection and enrichment from microarray data. In: IEEE international conference on bioinformatics and bioengineering, pp 15–22

  • Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, Bull S, Pawson T, Morris Q, Wrana JL (2009) Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol 27(2):199–204

    Article  Google Scholar 

  • Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. Chicago, pp 104–112

  • Wu C, Zhu J, Zhang X (2012) Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes. BMC Bioinform 13(1):182

    Article  MathSciNet  Google Scholar 

  • Wu X, Jiang R, Zhang MQ, Li S (2008) Network-based global inference of human disease genes. Mol Syst Biol 4(1):189

    Google Scholar 

  • Xiao Y, Hsiao TH, Suresh U, Chen HIH, Wu X, Wolf SE, Chen Y (2014) A novel significance score for gene selection and ranking. Bioinformatics 30(6):801–807

    Article  Google Scholar 

  • Xu X, Zhang A (2005) Selecting informative genes from microarray dataset by incorporating gene ontology. In: Fifth IEEE Symposium on Bioinformatics and Bioengineering, 2005. BIBE 2005. pp 241–245

  • Zhang W, Sun F, Jiang R (2011) Integrating multiple protein–protein interaction networks to prioritize disease genes: a bayesian regression approach. BMC Bioinform 12(Suppl 1):S11

    Article  Google Scholar 

  • Zhu Y, Shen X, Pan W (2009) Network-based support vector machine for classification of microarray samples. BMC Bioinform 10(Suppl 1):S21

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the financial support from CAPES, CNPq, FAPESP (Grant 2011/50761-2), FAPESP-Microsoft (Grant 2010/52138-8), eScience-PRP-USP and Indo-Brazil Collaborative Project, DST, Govt. of India and Govt. of Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Correa Martins.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest and that this research did not involve human participants and/or animals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Swarnkar, T., Simões, S.N., Anura, A. et al. Identifying dense subgraphs in protein–protein interaction network for gene selection from microarray data. Netw Model Anal Health Inform Bioinforma 4, 33 (2015). https://doi.org/10.1007/s13721-015-0104-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-015-0104-3

Keywords