Handling Missing Features with Boosting Algorithms for Protein–Protein Interaction Prediction

Smeraldi, Fabrizio; Defoin-Platel, Michael; Saqi, Mansoor

doi:10.1007/978-3-642-15120-0_11

Fabrizio Smeraldi²¹,
Michael Defoin-Platel²² &
Mansoor Saqi²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6254))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

565 Accesses
5 Citations

Abstract

Combining information from multiple heterogeneous data sources can aid prediction of protein-protein interaction. This information can be arranged into a feature vector for classification. However, missing values in the data can impact on the prediction accuracy. Boosting has emerged as a powerful tool for feature selection and classification. Bayesian methods have traditionally been used to cope with missing data, with boosting being applied to the output of Bayesian classifiers. We explore a variation of Adaboost that deals with the missing values at the level of the boosting algorithm itself, without the need for any density estimation step. Experiments on a publicly available PPI dataset suggest this overall simpler and mathematically coherent approach may be more accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azuaje, F., Dopazo, J.: Data Analysis and Visualization in Genomics and Proteomics. John Wiley & Sons, Chichester (2005)
Book Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)
Article Google Scholar
Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(Suppl. 1), i38–i46 (2005)
Article Google Scholar
Bork, P., Jensen, L.J., von Mering, C., Ramani, A.K., Lee, I., Marcotte, E.M.: Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14(3), 292–299 (2004)
Article Google Scholar
Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bhler, J., Wood, V., Dolinski, K., Tyers, M.: The bioGRID interaction database: 2008 update. Nucleic Acids Res. 36(Database issue), D637–D640 (2008)
Google Scholar
Deane, C.M., Salwiski, L., Xenarios, I., Eisenberg, D.: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1(5), 349–356 (2002)
Article Google Scholar
Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J., Gerstein, M.: Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 18(10), 529–536 (2002)
Article Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science 55(1) (1997)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
MATH Google Scholar
Jansen, R., Greenbaum, D., Gerstein, M.: Relating whole-genome expression data with protein-protein interactions. Genome Res. 12(1), 37–46 (2002)
Article Google Scholar
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003)
Article Google Scholar
Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R., Hermjakob, H.: Intact–open source resource for molecular interaction data. Nucleic Acids Res. 35(Database issue), D561–D565 (2007)
Article Google Scholar
Lin, M., Hu, B., Chen, L., Sun, P., Fan, Y., Wu, P., Chen, X.: Computational identification of potential molecular interactions in Arabidopsis. Plant Physiol. 151(1), 34–46 (2009)
Article Google Scholar
Lin, N., Wu, B., Jansen, R., Gerstein, M., Zhao, H.: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 5, 154 (2004)
Article Google Scholar
Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., Gerstein, M.: Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 15(7), 945–953 (2005)
Article Google Scholar
Malacaria, P., Smeraldi, F.: On Adaboost and optimal betting strategies. In: Proceedings of the 5th international conference on data mining (dmin/worldcomp), July 2009, pp. 326–332. CSREA Press (2009)
Google Scholar
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887), 399–403 (2002)
Article Google Scholar
Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: Machine Learning: An Artificial Intelligence Approach. Tioga Publishing Company (1983)
Google Scholar
Najafabadi, H.S., Salavati, R.: Sequence–based prediction of protein–protein interaction by means of codon usage. Genome Biology 9(5) (2008)
Google Scholar
Pelckmans, K., Brabanter, J.D., Suykens, J.A.K., Moor, B.D.: Handling missing values in support vector machine classifiers. Neural Networks 18, 684–692 (2005)
Article MATH Google Scholar
Rätsch, G., Warmuth, M.: Efficient margin maximizing with boosting. Journal of Machine Learning Research 6, 2131–2152 (2005)
Google Scholar
Rudin, C., Schapire, R.E., Daubechies, I.: On the dynamics of boosting. In: Advances in Neural Information Processing Systems, vol. 16 (2004)
Google Scholar
Schapire, R., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3) (1999)
Google Scholar
Scott, M.S., Barton, G.J.: Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics 8, 239 (2007)
Article Google Scholar
Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., Jiang, H.: Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. U.S.A. 104(11), 4337–4341 (2007)
Article Google Scholar
Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, London, UK, E14NS
Fabrizio Smeraldi
Rothamsted Research, Biomathematics and Bioinformatics, Harpenden, UK, AL52JQ
Michael Defoin-Platel & Mansoor Saqi

Authors

Fabrizio Smeraldi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Defoin-Platel
View author publications
You can also search for this author in PubMed Google Scholar
Mansoor Saqi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Linköpings universitet, 581 83, Linköping, Sweden
Patrick Lambrix
Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, 412 96, Gothenburg,, Sweden
Graham Kemp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smeraldi, F., Defoin-Platel, M., Saqi, M. (2010). Handling Missing Features with Boosting Algorithms for Protein–Protein Interaction Prediction. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-15120-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15119-4
Online ISBN: 978-3-642-15120-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics