Abstract
As more and more high-throughput proteome data are collected, automated annotation of protein function has been one of the most challenging problems of the post-genomic era. To address this challenge, we propose a novel functional annotation framework incorporating manifold embedding and multi-label classification to predict protein function on protein-protein interaction (PPI) network. Unlike the existing approaches that depend on the original network, our method weights it by edge betweenness, and embeds simultaneously the annotated and unannotated proteins into an Euclidean metric space via isometric feature mapping (ISOMAP). Then, with these low-dimensional coordinates, the protein expressions are quantified and the functional assignment is transformed into a multi-label classification problem. The approach results in a set of feasible functional labels for each unannotated protein. We conduct extensive experiments on yeast PPI database to evaluate the performance of different multi-label learning methods. The results demonstrate that the proposed method is an effective tool for protein function prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, X., Miao, Y., Cheng, M.: Finding motifs in DNA sequences using low-dispersion sequences. J. Comput. Biol. 21(4), 320–329 (2014)
Wang, X., Miao, Y.: GAEM: a hybrid algorithm incorporating GA with EM for planted edited motif finding problem. Curr. Bioinform. 9(5), 463–469 (2014)
Hamp, T., et al.: Homology-based inference sets the bar high for protein function prediction. BMC Bioinform. 14(3), 327–346 (2013)
Radivojac, P., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221–227 (2013)
Wass, M.N., Sternberg, M.J.: ConFunc–functional annotation in the twilight zone. Bioinformatics 24, 798–806 (2008)
Jones, C.E., Schwerdt, J., Bretag, T.A., Baumann, U., Brown, A.L.: GOSLING: a rulebased protein annotator using BLAST and GO. Bioinformatics 24, 2628–2629 (2008)
Sokolov, A., Ben-Hur, A.: Hierarchical classification of gene ontology terms using the GOstruct method. J. Bioinf. Comput. Biol. 8, 357–376 (2010)
Piovesan, D., et al.: Protein function prediction using guilty by association from interaction networks. Amino Acids 7, 1–10 (2015)
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function function prediction from protein-protein interaction networks. Nat. Biotechnol. 21(6), 697–700 (2003)
Chua, H., Sung, W., Wong, L.: Exploiting indirect neighbours and topological weighted to predict protein function from protein-protein inteactions. Bioinformatics 22(13), 1623–1630 (2006)
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, 302–310 (2005)
You, Z.H., Lei, Y.K., Huang, D.S., Zhou, X.B.: Using mainfold embedding for asessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)
Zhao, H.F., Sun, D.D., Wang, R.F., Luo, B.: A network-based approach for protein functions prediction using locally linear embedding. In: 4th International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4. IEEE Press, Chengdu (2010)
Huang, L., et al.: Link clustering with extended link similarity and EQ evaluation division. PLoS One 8(6), e66005 (2013)
Elisseeff, A., Weston, J., Becker, S.: A kernel method for multi-labbelled classification. In: Dietterich, T.G., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 681–687. MIT Press, Cambridge (2002)
Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transl. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Zhang, M.L., Zhou, Z.H.: ML-kNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Desmond, J.: Higham,: fitting a geometric graph to a protein-protein interaction network. Bioinformatics 24, 1093–1099 (2008)
Tenenbaum, J.B.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319 (2000)
Zhang, M.L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 999–1007. Washington (2010)
Zhang, M.L.: ML-RBF: RBF neural networks for multi-label learning. Neural Process. Lett. 29(2), 61–74 (2009)
Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label naive bayes classification. Inf. Sci. 179(19), 3218–3229 (2009)
Zhang, M.L.: LIFT: Multi-label learning with label-specific features. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1609–1614. Barcelona, Spain (2011)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61402002), the National Science Foundation of Anhui Province (No. 1408085QF120), and the Key Foundation of Natural Science Research for Institution of Higher Education of Anhui province (No. KJ2013A007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, H., Sun, D., Ding, Z., Ge, M. (2015). Protein Function Prediction Using Multi-label Learning and ISOMAP Embedding. In: Gong, M., Linqiang, P., Tao, S., Tang, K., Zhang, X. (eds) Bio-Inspired Computing -- Theories and Applications. BIC-TA 2015. Communications in Computer and Information Science, vol 562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49014-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-662-49014-3_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49013-6
Online ISBN: 978-3-662-49014-3
eBook Packages: Computer ScienceComputer Science (R0)