Skip to main content

A Preliminary Study on the Prediction of Human Protein Functions

  • Conference paper
Foundations on Natural and Artificial Computation (IWINAC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6686))

Abstract

In the human proteome, about 5’000 proteins lack experimentally validated functional information. In this work we propose to tackle the problem of human protein function prediction by three distinct supervised learning schemes: one-versus-all classification; tournament learning; multi-label learning. Target values of supervised learning models are represented by the nodes of a subset of the Gene Ontology, which is widely used as a benchmark for functional prediction. With an independent dataset including very difficult cases the recall measure reached a reasonable performance for the first 50 ranked predictions, on average; however, average precision was quite low.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., et al.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)

    Article  Google Scholar 

  2. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)

    Article  Google Scholar 

  3. Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth, Belmont (1984)

    MATH  Google Scholar 

  4. Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research 31(13), 3692–3697 (2003)

    Article  Google Scholar 

  5. Eisenberg, D., Schwarz, E., Komaromy, M., Wall, R.: Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179(1), 125–142 (1984)

    Article  Google Scholar 

  6. Hu, L., Huang, T., Shi, X., Lu, W.C., Cai, Y.D., Chou, K.C.: Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One 6(1), e14556 (2011)

    Google Scholar 

  7. Jensen, L.J., Gupta, R., Staerfeldt, H.-H., Brunak, S.: Prediction of human protein function according to gene ontology categories. Bioinformatics 19(5), 635–642 (2003)

    Article  Google Scholar 

  8. Kazawa, H., Izumitani, T., Taira, H., Maeda, E.: Maximal margin labelling for multi-topic text categorization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 649–656. MIT Press, Cambridge (2005)

    Google Scholar 

  9. Mewes, H.W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., Frishman, D.: MIPS: a database for protein sequences and complete genomes. Nucl. Acids Research 27, 44–48 (1999)

    Article  Google Scholar 

  10. Pena-Castillo, L., Tasan, M., Myers, C.L., Lee, H., Joshi, T., Zhang, C., Guan, Y., Leone, M., Pagnani, A., Kim, W.K., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9(suppl. 1), S2 (2008)

    Article  Google Scholar 

  11. Ranea, J.A., Yeats, C., Grant, A., Orengo, C.A.: Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comput. Biol. 3(11), e237 (2007)

    Article  Google Scholar 

  12. The Gene Ontology Consortium. The gene ontology project in 2008. Nucleic Acid Research 36(1), D440–D444 (November 2007)

    Google Scholar 

  13. Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. International Journal of Data Warehouse and Mining 3(3), 1–13 (2007)

    Article  Google Scholar 

  14. Vapnik, V.: The nature of statistical learning. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  15. Vens, C., Struyf, J., Schietgat, L., Dzeroski, S.: Decision trees for hierarchical multi-label classification. Machine Learning 73(2), 185–214 (2008)

    Article  Google Scholar 

  16. Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18(10), 1338–1351 (2006)

    Article  Google Scholar 

  17. Zhu, M., Gao, L., Guo, Z., Li, Y., Wang, D., Wang, J., Wang, C.: Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities. Gene 391(1-2), 113–119 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bologna, G., Veuthey, AL., Pagni, M., Lane, L., Bairoch, A. (2011). A Preliminary Study on the Prediction of Human Protein Functions. In: Ferrández, J.M., Álvarez Sánchez, J.R., de la Paz, F., Toledo, F.J. (eds) Foundations on Natural and Artificial Computation. IWINAC 2011. Lecture Notes in Computer Science, vol 6686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21344-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21344-1_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21343-4

  • Online ISBN: 978-3-642-21344-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics