Skip to main content
Log in

Complex detection from PPI data using ensemble method

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

Many algorithms have been proposed recently to detect protein complexes in protein–protein interaction (PPI) networks. Most proteins form complexes to accomplish biological functions such as transcription of DNA, translation of mRNA and cell growth. Since proteins perform their tasks by interacting with each other, determining these protein–protein interactions is an important task. Traditional clustering approaches for protein complex identification cannot deal with noisy and incomplete PPI data and dependent on information from a single source. Since the noise in the interaction datasets hampers the detection of accurate protein complexes, we propose an ensemble approach for protein complex detection that attempts to combine information from Gene Ontology at the time of complex detection. The PPI data network is taken as input by several baseline complex detection algorithms to generate protein complexes. The protein complexes are then subsequently combined by the proposed ensemble using a consensus building module for the purpose of identifying meaningful complexes. The protein complexes thus predicted by the ensemble are evaluated by comparing them to a set of gold standard protein complexes and their biological relevance established using a co-localization score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics 22(8):1021–1023

    Article  Google Scholar 

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29

    Article  Google Scholar 

  • Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):2. doi:10.1186/1471-2105-4-2

    Article  Google Scholar 

  • Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinform 7:488. doi:10.1186/1471-2105-7-488

    Article  Google Scholar 

  • Dai Q, Duan X, Guo M, Guo Y (2016) EnPC: an EnsembleClustering framework for detecting protein complexes in protein–protein interaction network. Curr Proteom 13(2):143–150

    Article  Google Scholar 

  • Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  MATH  Google Scholar 

  • Friedel CC, Krumsiek J, Zimmer R (2009) Bootstrapping the interactome: unsupervised identification of protein complexes in yeast. J Comput Biol 16(8):971–987

    Article  MathSciNet  Google Scholar 

  • Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084):631–636

    Article  Google Scholar 

  • Greene D, Cagney G, Krogan N, Cunningham P (2008) Ensemble nonnegative matrix factorization. Bioinformatics 24:1722–1728

    Article  Google Scholar 

  • Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G (2008) Gene ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res 36(Database issue):577–581

    Google Scholar 

  • Huh WK, Falvo JV, Gerke LC, Carroll SA, Howson RW, Weissman JS, O’Shea EK (2003) Global analysis of protein localization in budding yeast. Nature 425:686–691

    Article  Google Scholar 

  • Hung MC, Link W (2011) Protein localization in disease and therapy. J Cell Sci 124:3381–3392

    Article  Google Scholar 

  • Hung IH, Suzuki M, Yamaguchi Y, Yuan DS, Klausner RD, Gitlin JD (1997) Biochemical characterization of the Wilson disease protein and functional expression in the yeast Saccharomyces cerevisiae. J Biol Chem 272:21461–21466

    Article  Google Scholar 

  • Jansen R, Gerstein M (2004) Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol 7:535–545

    Article  Google Scholar 

  • King A, Przulj N, Jurisica I (2004) Protein complex prediction via cost-based clustering. Bioinformatics 20(17):3013–3020

    Article  Google Scholar 

  • Krogan N, Cagney G, Yu H, Zhong G, Guo X et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637–643

    Article  Google Scholar 

  • Krumsiek J, Friedel CC, Zimmer R (2008) ProCope-Protein complex Prediction and evaluation. Bioinformatics 24(18):2115–2116. doi:10.1093/bioinformatics/btn376

    Article  Google Scholar 

  • Li XL, Wu M, Kwoh CC, Ng SK (2010) Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genom. 11(Suppl 1):S3. doi:10.1186/1471-2164-11-S1-S3

    Article  Google Scholar 

  • Li X, Wang J, Zhao B, Wu F-X, Pan Y (2016) Identification of protein complexes from multi-relationship protein interaction networks. Hum Genom 10(Suppl 2):17. doi:10.1186/s40246-016-0069-z

    Article  Google Scholar 

  • Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA, pp 296–304

  • Liu G, Wong L, Chua HN (2009) Complex discovery from weighted PPI networks. Bioinformatics 25(15):1891–1897

    Article  Google Scholar 

  • Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19:1275–1283

    Article  Google Scholar 

  • Macropol K, Can T, Singh AK (2009) RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinform 10:283

    Article  Google Scholar 

  • Mewes HW, Frishman D, Mayer KFX, Münsterkötter M, Noubibou O, Rattei T, Oesterheld M, Stümpflen V (2004) MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res 32:41–44

    Article  Google Scholar 

  • Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinform 2(3):159–173

    Article  Google Scholar 

  • Nagi S, Bhattacharyya DK (2014) Cluster analysis of cancer data using semantic similarity, sequence similarity and biological measures. Netw Model Anal Health Inform Bioinform 3(67):1–38

    Google Scholar 

  • Nepusz T, Yu H, Paccanaro A (2012) Detecting overlapping protein complexes in protein–protein interaction networks. Nat Methods 9(5):471–472

    Article  Google Scholar 

  • Ou-Yang L, Dai D-Q, Zhang XF (2013) Protein complex detection via weighted ensemble. PLoS One 8(5):e62158

    Article  Google Scholar 

  • Ou-Yang L, Dai D-Q, Zhang XF (2015) Detecting protein complexes from signed protein–protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 12(6):1333–1344

    Article  Google Scholar 

  • Payne AS, Kelly EJ, Gitlin JD (1998) Functional expression of the Wilson disease protein reveals mislocalization and impaired copper-dependent trafficking of the common H1069Q mutation. Proc Natl Acad Sci USA 95:10854–10859

    Article  Google Scholar 

  • Pereira-Leal JB, Enright AJ, Ouzounis CA (2004) Detection of functional modules from protein interaction networks. Proteins Struct Funct Bioinform 54:49–57

    Article  Google Scholar 

  • Resnick P (1999) Semantic similarity in a taxonomy: an information based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130

    Google Scholar 

  • Schlicker A, Domingues FS, Rahnenfuhrer J, Lengauer T (2006) A new measure for functional similarity of gene products based on gene ontology. BMC Bioinform 7:302

    Article  Google Scholar 

  • Sharma P, Ahmed HA, Roy S, Bhattacharyya DK (2015) Unsupervised methods for finding protein complexes from PPI networks. Netw Model Anal Health Inform Bioinform 4(1):1–15

    Article  Google Scholar 

  • Srihari S, Leong HW (2012) Employing functional interactions for the characterization and detection of sparse complexes from yeast PPI networks. Int J Bioinform Res Appl 8(3/4):286–304

    Article  Google Scholar 

  • Srihari S, Leong HW (2013) A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comput Biol 11(2):1230002. doi:10.1142/S021972001230002X

    Article  Google Scholar 

  • Srihari S, Yong CH, Patil A, Wong L (2015) Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes. FEBS Lett 589(19 Pt A):2590–2602

    Article  Google Scholar 

  • Tanaka AR, Abe-Dohmae S, Ohnishi T, Aoki R, Morinaga G, Okuhira K, Ikeda Y, Kano F, Matsuo M, Kioka N (2003) Effects of mutations of ABCA1 in the first extracellular domain on subcellular trafficking and ATP binding/hydrolysis. J Biol Chem 278:8815–8819

    Article  Google Scholar 

  • Van Dongen S (2000) Graph Clustering by Flow Simulation. University of Utrecht

  • Van Rijsbergen CJ (1979) Information retireval. Butterworths, London

    Google Scholar 

  • Vlasblom J, Wodak S (2009) Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform 10:99. doi:10.1186/1471-2105-10-99

    Article  Google Scholar 

  • Wu M et al (2013) Identifying protein complexes from heterogeneous biological data. Proteins Struct Funct Bioinform 81(11):2023–2033

    Article  Google Scholar 

  • Wu M, Ou-Yang L, Li X-L (2016) Protein complex detection via effective integration of base clustering solutions and co-complex affinity scores. IEEE/ACM Trans Comput Biol Bioinform. doi:10.1109/TCBB.2016.2552176

    Google Scholar 

  • Yang P, Hwa YY, Zhou BB, Zomaya AY (2010) A review of ensemble methods in bioinformatics. Curr Bioinform 4(5):296–308

    Article  Google Scholar 

  • Yong CH, Liu G, Chua HN, Wong L (2012) Supervised maximum-likelihood weighting of composite protein networks for complex prediction. BMC Syst Biol 6(Suppl 2):S13

    Article  Google Scholar 

  • Yong CH, Maruyama O, Wong L (2014) Discovery of small protein complexes from ppi networks with size-specific supervised weighting. BMC Syst Biol 8(Suppl 5):3

    Article  Google Scholar 

  • Zaki N (2014) Multi-protein complex detection by integrating network topological features and biological process information. In: Prague, Proceedings of the International Conference on Biomedical Engineering and Systems (ICBES’14)

  • Zhang Y, Ge L, Du N, Fan G, Jia K, Zhang A (2012) A graph-based cluster ensemble method to detect protein functional modules from multiple information sources. In: In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB ‘12), ACM, New York, NY, USA. pp 567–569. doi:10.1145/2382936.2383023

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajid Nagi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nagi, S., Bhattacharyya, D.K. & Kalita, J.K. Complex detection from PPI data using ensemble method. Netw Model Anal Health Inform Bioinforma 6, 3 (2017). https://doi.org/10.1007/s13721-016-0144-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-016-0144-3

Keywords

Navigation