Abstract
The advent of large volume of molecular interactions has led to the emergence of a considerable number of computational approaches for studying protein function in the context of network. These algorithms, however, treat each functional class independently and thereby suffer from a difficulty of assigning multiple functions to a protein simultaneously. We propose here a new semi-supervised algorithm, called MCSL, by considering the correlations among functional categories which improves the performance significantly. The guiding intuition is that a protein can receive label information not only from its neighbors annotated with the same category in functional-linkage network, but also from its partners labeled with other classes in category network if their respective neighborhood topologies are a good match. We encode this intuition as a two-dimensional version of network-based learning with local and global consistency. Experiments on a Saccharomyces cerevisiae protein-protein interaction network show that our algorithm can achieve superior performance compared with four state-of-the-art methods by 5-fold cross validation with 66 second-level and 77 informative MIPS functional categories respectively. Furthermore, we make predictions for the 204 uncharacterized proteins and most of these assignments could be directly found in or indirectly inferred from SGD database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Breitkreutz, B.J., Stark, C., Reguly, T., et al.: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 36(Database issue), D637–D640 (2008)
Chen, G., Song, Y., Wang, F., Zhang, C.: Semi-supervised Multi-label Learning by Solving a Sylvester Equation. In: SIAM International Conference on Data Mining (2008)
Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22, 1623–1630 (2006)
Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002)
Fan, R.-E., Lin, C.-J.: A Study on Threshold Selection for Multi-label Classification. Technical Report, National Taiwan University (2007)
Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)
Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from proteinCprotein interaction data. Yeast 18, 523–531 (2001)
Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., Sakaki, Y.: Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl Acad. Sci. USA 97, 1143–1147 (2000)
Karaoz, U., Murali, T.M., Letovsky, S., Zheng, Y., Ding, C., Cantor, C.R., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. USA 101, 2888–2893 (2004)
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(Suppl 1), i302–i310 (2005)
Pavlidis, P., Weston, J., Cai, J., Grundy, W.N.: Gene functional classification from heterogeneous data. In: Proceedings of the Fifth Annual International Conference on Computational Biology. ACM Press, Montreal (2001)
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)
Schwikowski, B., Uetz, P., Fields, S.: A network of proteinCprotein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000)
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Molecular Systems Biology 3, 88 (2007)
Singh, R., Xu, J., Berger, B.: Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc. Natl. Acad. Sci. USA 105, 12763–12768 (2008)
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from proteinCprotein interaction networks. Nat. Biotechnol. 21, 697–700 (2003)
Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X.: Graph-based semi-supervised learning with multi-label. In: IEEE International Conference on Multiamedia and Expo (2008)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems (NIPS), vol. 16, pp. 321–328. MIT Press, Cambridge (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, J.Q. (2011). Multi-label Correlated Semi-supervised Learning for Protein Function Prediction. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-21260-4_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21259-8
Online ISBN: 978-3-642-21260-4
eBook Packages: Computer ScienceComputer Science (R0)