Skip to main content

Analysis of Informative Features for Negative Selection in Protein Function Prediction

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

Abstract

Negative examples in automated protein function prediction (AFP), that is proteins known not to possess a given protein function, are usually not directly stored in public proteome and genome databases, such as the Gene Ontology database. Nevertheless, most computational methods need negative examples to infer new predictions. A variety of algorithms has been proposed in AFP for negative selection, ranging from network- and feature-based heuristics, to hierarchy-based and hierarchy-less strategies. Moreover, several bio-molecular information sources about proteins, such as gene co-expression, genetic and protein-protein interactions data, are naturally encoded in protein networks, where nodes are proteins and edges connect proteins sharing common characteristics. Although selecting negatives in biological networks is thereby a central and challenging problem in computational biology, detecting the characteristics proteins should have to be considered as negative is still a difficult task. It this work, we show that a few protein features extracted from the network help in detecting reliable negatives. We tested such features in two real world experiments: predicting unreliable negatives with an SVM classifier through temporal holdout on model organisms for AFP, and selecting reliable negatives with a clustering-based state-of-the-art negative selection procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.nlm.nih.gov/mesh.

References

  1. Robinson, P.N., et al.: The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83(5), 610–615 (2008)

    Article  Google Scholar 

  2. Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32(18), 5539–5545 (2004)

    Article  Google Scholar 

  3. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nature Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  4. Radivojac, P., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221–227 (2013)

    Article  Google Scholar 

  5. Jiang, Y., Oron, T.R., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17(1), 184 (2016)

    Article  Google Scholar 

  6. Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)

    Article  Google Scholar 

  7. Burghouts, G.J., Schutte, K., Bouma, H., den Hollander, R.J.M.: Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos. Mach. Vis. Appl. 25(1), 85–98 (2014)

    Article  Google Scholar 

  8. Frasca, M., Malchiodi, D.: Selection of negative examples for node label prediction through fuzzy clustering techniques. In: Bassis, S., Esposito, A., Morabito, F.C., Pasero, E. (eds.) Advances in Neural Networks. SIST, vol. 54, pp. 67–76. Springer, Cham (2016). doi:10.1007/978-3-319-33747-0_7

    Chapter  Google Scholar 

  9. Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions from protein sequences. Bioinformatics 19(15), 1875–1881 (2003)

    Article  Google Scholar 

  10. Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 419–427 (2009)

    Google Scholar 

  11. Youngs, N., Penfold-Brown, D., Drew, K., Shasha, D., Bonneau, R.: Parametric bayesian priors and better choice of negative examples improve protein function prediction. Bioinformatics 29(9), tt10-98 (2013)

    Google Scholar 

  12. Youngs, N., Penfold-Brown, D., Bonneau, R., Shasha, D.: Negative example selection for protein function prediction: the NoGO database. PLOS Comput. Biol. 10(6), 1–12 (2014)

    Google Scholar 

  13. Frasca, M., Bassis, S.: Gene-disease prioritization through cost-sensitive graph-based methodologies. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 739–751. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31744-1_64

    Chapter  Google Scholar 

  14. Ashburn, T.T., Thor, K.B.: Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3(8), 673–683 (2004)

    Article  Google Scholar 

  15. Gillis, J., Pavlidis, P.: The impact of multifunctional genes on “Guilt by Association” analysis. PLoS ONE 6(2), e17258 (2011)

    Google Scholar 

  16. Frasca, M.: Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 162, 48–56 (2015)

    Article  Google Scholar 

  17. Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)

    Article  Google Scholar 

  18. Frasca, M., Bertoni, A., et al.: UNIPred: unbalance-aware Network Integration and Prediction of protein functions. J. Comput. Biol. 22(12), 1057–1074 (2015)

    Article  Google Scholar 

  19. Szklarczyk, D., et al.: String v10: proteinprotein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)

    Article  Google Scholar 

  20. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    Google Scholar 

  21. Mostafavi, S., Goldenberg, A., Morris, Q.: Labeling nodes using three degrees of propagation. PLoS ONE 7(12), e51947 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dario Malchiodi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Frasca, M., Lipreri, F., Malchiodi, D. (2017). Analysis of Informative Features for Negative Selection in Protein Function Prediction. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56154-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56153-0

  • Online ISBN: 978-3-319-56154-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics