Abstract
Many algorithms have been proposed recently to detect protein complexes in protein–protein interaction (PPI) networks. Most proteins form complexes to accomplish biological functions such as transcription of DNA, translation of mRNA and cell growth. Since proteins perform their tasks by interacting with each other, determining these protein–protein interactions is an important task. Traditional clustering approaches for protein complex identification cannot deal with noisy and incomplete PPI data and dependent on information from a single source. Since the noise in the interaction datasets hampers the detection of accurate protein complexes, we propose an ensemble approach for protein complex detection that attempts to combine information from Gene Ontology at the time of complex detection. The PPI data network is taken as input by several baseline complex detection algorithms to generate protein complexes. The protein complexes are then subsequently combined by the proposed ensemble using a consensus building module for the purpose of identifying meaningful complexes. The protein complexes thus predicted by the ensemble are evaluated by comparing them to a set of gold standard protein complexes and their biological relevance established using a co-localization score.





Similar content being viewed by others
References
Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics 22(8):1021–1023
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29
Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):2. doi:10.1186/1471-2105-4-2
Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinform 7:488. doi:10.1186/1471-2105-7-488
Dai Q, Duan X, Guo M, Guo Y (2016) EnPC: an EnsembleClustering framework for detecting protein complexes in protein–protein interaction network. Curr Proteom 13(2):143–150
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Friedel CC, Krumsiek J, Zimmer R (2009) Bootstrapping the interactome: unsupervised identification of protein complexes in yeast. J Comput Biol 16(8):971–987
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084):631–636
Greene D, Cagney G, Krogan N, Cunningham P (2008) Ensemble nonnegative matrix factorization. Bioinformatics 24:1722–1728
Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G (2008) Gene ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res 36(Database issue):577–581
Huh WK, Falvo JV, Gerke LC, Carroll SA, Howson RW, Weissman JS, O’Shea EK (2003) Global analysis of protein localization in budding yeast. Nature 425:686–691
Hung MC, Link W (2011) Protein localization in disease and therapy. J Cell Sci 124:3381–3392
Hung IH, Suzuki M, Yamaguchi Y, Yuan DS, Klausner RD, Gitlin JD (1997) Biochemical characterization of the Wilson disease protein and functional expression in the yeast Saccharomyces cerevisiae. J Biol Chem 272:21461–21466
Jansen R, Gerstein M (2004) Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol 7:535–545
King A, Przulj N, Jurisica I (2004) Protein complex prediction via cost-based clustering. Bioinformatics 20(17):3013–3020
Krogan N, Cagney G, Yu H, Zhong G, Guo X et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637–643
Krumsiek J, Friedel CC, Zimmer R (2008) ProCope-Protein complex Prediction and evaluation. Bioinformatics 24(18):2115–2116. doi:10.1093/bioinformatics/btn376
Li XL, Wu M, Kwoh CC, Ng SK (2010) Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genom. 11(Suppl 1):S3. doi:10.1186/1471-2164-11-S1-S3
Li X, Wang J, Zhao B, Wu F-X, Pan Y (2016) Identification of protein complexes from multi-relationship protein interaction networks. Hum Genom 10(Suppl 2):17. doi:10.1186/s40246-016-0069-z
Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA, pp 296–304
Liu G, Wong L, Chua HN (2009) Complex discovery from weighted PPI networks. Bioinformatics 25(15):1891–1897
Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19:1275–1283
Macropol K, Can T, Singh AK (2009) RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinform 10:283
Mewes HW, Frishman D, Mayer KFX, Münsterkötter M, Noubibou O, Rattei T, Oesterheld M, Stümpflen V (2004) MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res 32:41–44
Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinform 2(3):159–173
Nagi S, Bhattacharyya DK (2014) Cluster analysis of cancer data using semantic similarity, sequence similarity and biological measures. Netw Model Anal Health Inform Bioinform 3(67):1–38
Nepusz T, Yu H, Paccanaro A (2012) Detecting overlapping protein complexes in protein–protein interaction networks. Nat Methods 9(5):471–472
Ou-Yang L, Dai D-Q, Zhang XF (2013) Protein complex detection via weighted ensemble. PLoS One 8(5):e62158
Ou-Yang L, Dai D-Q, Zhang XF (2015) Detecting protein complexes from signed protein–protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 12(6):1333–1344
Payne AS, Kelly EJ, Gitlin JD (1998) Functional expression of the Wilson disease protein reveals mislocalization and impaired copper-dependent trafficking of the common H1069Q mutation. Proc Natl Acad Sci USA 95:10854–10859
Pereira-Leal JB, Enright AJ, Ouzounis CA (2004) Detection of functional modules from protein interaction networks. Proteins Struct Funct Bioinform 54:49–57
Resnick P (1999) Semantic similarity in a taxonomy: an information based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
Schlicker A, Domingues FS, Rahnenfuhrer J, Lengauer T (2006) A new measure for functional similarity of gene products based on gene ontology. BMC Bioinform 7:302
Sharma P, Ahmed HA, Roy S, Bhattacharyya DK (2015) Unsupervised methods for finding protein complexes from PPI networks. Netw Model Anal Health Inform Bioinform 4(1):1–15
Srihari S, Leong HW (2012) Employing functional interactions for the characterization and detection of sparse complexes from yeast PPI networks. Int J Bioinform Res Appl 8(3/4):286–304
Srihari S, Leong HW (2013) A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comput Biol 11(2):1230002. doi:10.1142/S021972001230002X
Srihari S, Yong CH, Patil A, Wong L (2015) Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes. FEBS Lett 589(19 Pt A):2590–2602
Tanaka AR, Abe-Dohmae S, Ohnishi T, Aoki R, Morinaga G, Okuhira K, Ikeda Y, Kano F, Matsuo M, Kioka N (2003) Effects of mutations of ABCA1 in the first extracellular domain on subcellular trafficking and ATP binding/hydrolysis. J Biol Chem 278:8815–8819
Van Dongen S (2000) Graph Clustering by Flow Simulation. University of Utrecht
Van Rijsbergen CJ (1979) Information retireval. Butterworths, London
Vlasblom J, Wodak S (2009) Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform 10:99. doi:10.1186/1471-2105-10-99
Wu M et al (2013) Identifying protein complexes from heterogeneous biological data. Proteins Struct Funct Bioinform 81(11):2023–2033
Wu M, Ou-Yang L, Li X-L (2016) Protein complex detection via effective integration of base clustering solutions and co-complex affinity scores. IEEE/ACM Trans Comput Biol Bioinform. doi:10.1109/TCBB.2016.2552176
Yang P, Hwa YY, Zhou BB, Zomaya AY (2010) A review of ensemble methods in bioinformatics. Curr Bioinform 4(5):296–308
Yong CH, Liu G, Chua HN, Wong L (2012) Supervised maximum-likelihood weighting of composite protein networks for complex prediction. BMC Syst Biol 6(Suppl 2):S13
Yong CH, Maruyama O, Wong L (2014) Discovery of small protein complexes from ppi networks with size-specific supervised weighting. BMC Syst Biol 8(Suppl 5):3
Zaki N (2014) Multi-protein complex detection by integrating network topological features and biological process information. In: Prague, Proceedings of the International Conference on Biomedical Engineering and Systems (ICBES’14)
Zhang Y, Ge L, Du N, Fan G, Jia K, Zhang A (2012) A graph-based cluster ensemble method to detect protein functional modules from multiple information sources. In: In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB ‘12), ACM, New York, NY, USA. pp 567–569. doi:10.1145/2382936.2383023
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nagi, S., Bhattacharyya, D.K. & Kalita, J.K. Complex detection from PPI data using ensemble method. Netw Model Anal Health Inform Bioinforma 6, 3 (2017). https://doi.org/10.1007/s13721-016-0144-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-016-0144-3