Skip to main content

Handling Missing Features with Boosting Algorithms for Protein–Protein Interaction Prediction

  • Conference paper
Book cover Data Integration in the Life Sciences (DILS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6254))

Included in the following conference series:

Abstract

Combining information from multiple heterogeneous data sources can aid prediction of protein-protein interaction. This information can be arranged into a feature vector for classification. However, missing values in the data can impact on the prediction accuracy. Boosting has emerged as a powerful tool for feature selection and classification. Bayesian methods have traditionally been used to cope with missing data, with boosting being applied to the output of Bayesian classifiers. We explore a variation of Adaboost that deals with the missing values at the level of the boosting algorithm itself, without the need for any density estimation step. Experiments on a publicly available PPI dataset suggest this overall simpler and mathematically coherent approach may be more accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Azuaje, F., Dopazo, J.: Data Analysis and Visualization in Genomics and Proteomics. John Wiley & Sons, Chichester (2005)

    Book  Google Scholar 

  2. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)

    Article  Google Scholar 

  3. Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(Suppl. 1), i38–i46 (2005)

    Article  Google Scholar 

  4. Bork, P., Jensen, L.J., von Mering, C., Ramani, A.K., Lee, I., Marcotte, E.M.: Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14(3), 292–299 (2004)

    Article  Google Scholar 

  5. Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bhler, J., Wood, V., Dolinski, K., Tyers, M.: The bioGRID interaction database: 2008 update. Nucleic Acids Res. 36(Database issue), D637–D640 (2008)

    Google Scholar 

  6. Deane, C.M., Salwiski, L., Xenarios, I., Eisenberg, D.: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1(5), 349–356 (2002)

    Article  Google Scholar 

  7. Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J., Gerstein, M.: Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 18(10), 529–536 (2002)

    Article  Google Scholar 

  8. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science 55(1) (1997)

    Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  10. Jansen, R., Greenbaum, D., Gerstein, M.: Relating whole-genome expression data with protein-protein interactions. Genome Res. 12(1), 37–46 (2002)

    Article  Google Scholar 

  11. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003)

    Article  Google Scholar 

  12. Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R., Hermjakob, H.: Intact–open source resource for molecular interaction data. Nucleic Acids Res. 35(Database issue), D561–D565 (2007)

    Article  Google Scholar 

  13. Lin, M., Hu, B., Chen, L., Sun, P., Fan, Y., Wu, P., Chen, X.: Computational identification of potential molecular interactions in Arabidopsis. Plant Physiol. 151(1), 34–46 (2009)

    Article  Google Scholar 

  14. Lin, N., Wu, B., Jansen, R., Gerstein, M., Zhao, H.: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 5, 154 (2004)

    Article  Google Scholar 

  15. Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., Gerstein, M.: Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 15(7), 945–953 (2005)

    Article  Google Scholar 

  16. Malacaria, P., Smeraldi, F.: On Adaboost and optimal betting strategies. In: Proceedings of the 5th international conference on data mining (dmin/worldcomp), July 2009, pp. 326–332. CSREA Press (2009)

    Google Scholar 

  17. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887), 399–403 (2002)

    Article  Google Scholar 

  18. Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: Machine Learning: An Artificial Intelligence Approach. Tioga Publishing Company (1983)

    Google Scholar 

  19. Najafabadi, H.S., Salavati, R.: Sequence–based prediction of protein–protein interaction by means of codon usage. Genome Biology 9(5) (2008)

    Google Scholar 

  20. Pelckmans, K., Brabanter, J.D., Suykens, J.A.K., Moor, B.D.: Handling missing values in support vector machine classifiers. Neural Networks 18, 684–692 (2005)

    Article  MATH  Google Scholar 

  21. Rätsch, G., Warmuth, M.: Efficient margin maximizing with boosting. Journal of Machine Learning Research 6, 2131–2152 (2005)

    Google Scholar 

  22. Rudin, C., Schapire, R.E., Daubechies, I.: On the dynamics of boosting. In: Advances in Neural Information Processing Systems, vol. 16 (2004)

    Google Scholar 

  23. Schapire, R., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3) (1999)

    Google Scholar 

  24. Scott, M.S., Barton, G.J.: Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics 8, 239 (2007)

    Article  Google Scholar 

  25. Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., Jiang, H.: Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. U.S.A. 104(11), 4337–4341 (2007)

    Article  Google Scholar 

  26. Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smeraldi, F., Defoin-Platel, M., Saqi, M. (2010). Handling Missing Features with Boosting Algorithms for Protein–Protein Interaction Prediction. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15120-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15119-4

  • Online ISBN: 978-3-642-15120-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics