Combining One-Class Classification Models Based on Diverse Biological Data for Prediction of Protein-Protein Interactions

Reyes, José A.; Gilbert, David

doi:10.1007/978-3-540-69828-9_18

José A. Reyes^1,2 &
David Gilbert¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5109))

Included in the following conference series:

International Workshop on Data Integration in the Life Sciences

Abstract

This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse biological data. Gold Standard data sets frequently employed for this task contain a high proportion of instances related to ribosomal proteins. We demonstrate that this situation biases the classification results and additionally that the prediction of non-ribosomal based PPI is a much more difficult task. In order to improve the performance of this subtask we have integrated more biological data into the classification process, including data from mRNA expression experiments and protein secondary structure information. Furthermore we have investigated several strategies for combining diverse one-class classification (OCC) models generated from different subsets of biological data. The weighted average combination approach exhibits the best results, significantly improving the performance attained by any single classification model evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Predicting Associations Between Proteins and Multiple Diseases

Techniques for Developing Reliable Machine Learning Classifiers Applied to Understanding and Predicting Protein:Protein Interaction Hot Spots

An Empirical Investigation of Discretization Techniques on the Classification of Protein–Protein Interaction

References

Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M.: A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature 403, 623–627 (2000)
Article Google Scholar
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. 98, 4569–4574 (2001)
Article Google Scholar
Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., Remor, M., Hofert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.A., Copley, R.R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., Superti-Furga, G.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)
Article Google Scholar
Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yang, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A.R., Sassi, H., Nielsen, P.A., Rasmussen, K.J., Andersen, J.R., Johansen, L.E., Hansen, L.H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Srensen, B.D., Matthiesen, J., Hendrickson, R.C., Gleeson, F., Pawson, T., Moran, M.F., Durocher, D., Mann, M., Hogue, C.W.V., Figeys, D., Tyers, M.: Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002)
Article Google Scholar
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)
Article Google Scholar
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)
Article Google Scholar
Lin, N., Wu, B., Jansen, R., Gerstein, M., Zhao, H.: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 5(154) (2004)
Google Scholar
Zhang, L., Wong, S., King, O., Roth, F.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5(38) (2004)
Google Scholar
Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., Gerstein, M.: Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 15, 945–953 (2005)
Article Google Scholar
Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(suppl. 1), i38–i46 (2005)
Article Google Scholar
Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins: Structure, Function, and Bioinformatics 63, 490–500 (2006)
Article Google Scholar
Ben-Hur, A., Noble, W.S.: Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 7(S2) (2006)
Google Scholar
Reyes, J.A., Gilbert, D.: Prediction of protein-protein interactions using one-class classification methods and integrating diverse data. Journal of Integrative Bioinformatics 4 (2007)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine Learning 54, 45–66 (2004)
Article MATH Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6, 1–6 (2004)
Article Google Scholar
Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: Mips: a database for genomes and protein sequences. Nucl. Acids Res. 30, 31–34 (2002)
Article Google Scholar
Browne, F., Wang, H., Zheng, H., Azuaje, F.: An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions. Journal of Integrative Bioinformatics 3 (2006)
Google Scholar
Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S.H.: Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000)
Article Google Scholar
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65–73 (1998)
Article Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)
Google Scholar
Drummond, C., Holte, R.C.: Learning to live with false alarms. In: Workshop on Data Mining Methods for Anomaly Detection, Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Joachims, T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in kernel methods: support vector learning, pp. 169–184. MIT Press, Cambridge (1999)
Google Scholar
Van Berlo, R.J.P., Wessels, L.F., Ridder, D.D.E., Reinders, M.J.T.: Protein complex prediction using an integrative bioinformatics approach. J. Bioinform. Comput. Biol. 5, 839–864 (2007)
Article Google Scholar
Tax, D.M.J.: Ddtools, the Data Description Toolbox for Matlab, http://www-ict.ewi.tudelft.nl/~davidt/dd_tools.html
Guo, Z., Li, Y., Gong, X., Yao, C., Ma, W., Wang, D., Li, Y., Zhu, J., Zhang, M., Yang, D., Wang, J.: Edge-based scoring and searching method for identifying condition-responsive protein protein interaction sub-network. Bioinformatics 23, 2121–2128 (2007)
Article Google Scholar
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000)
Google Scholar
Neuvirth, H., Raz, R., Schreiber, G.: Promate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 338, 181–199 (2004)
Article Google Scholar
Hoskins, J., Lovell, S., Blundell, T.L.: An algorithm for predicting protein-protein interaction sites: Abnormally exposed amino acid residues and secondary structure elements. Protein Sci. 15, 1017–1029 (2006)
Article Google Scholar
Guharoy, M., Chakrabarti, P.: Secondary structure based analysis and classification of biological interfaces: identification of binding motifs in protein protein interactions. Bioinformatics 23, 1909–1918 (2007)
Article Google Scholar
Zhou, H.X., Qin, S.: Interaction-site prediction for protein complexes: a critical assessment. Bioinformatics 23, 2203–2209 (2007)
Article Google Scholar
Cheng, J., Randall, A.Z., Sweredoski, M.J., Baldi, P.: SCRATCH: a protein structure and structural feature prediction server. Nucl. Acids Res. 33(suppl-2), W72–W76 (2005)
Article Google Scholar
Fontana, P., Bindewald, E., Toppo, S., Velasco, R., Valle, G., Tosatto, S.C.E.: The SSEA server for protein secondary structure alignment. Bioinformatics 21, 393–395 (2005)
Article Google Scholar
Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22, 1456–1463 (2006)
Article Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)
Article MATH Google Scholar
Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Information Fusion 6, 83–98 (2005)
Article Google Scholar
Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Machine Learning 65, 247–271 (2006)
Article Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)
Article Google Scholar
Yule, G.U.: On the association of attributes in statistics. Philosophical Transactions of the Royal Society of London A(194), 257–319 (1900)
Article Google Scholar
Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: 13th International Conference on Machine Learning, pp. 275–283. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Chichester (2004)
Google Scholar
Duin, R.: The combining classifier: to train or not to train? In: 16th International Conference on Pattern Recognition, vol. 2, pp. 765–770 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow, UK, G12 8QQ
José A. Reyes & David Gilbert
Facultad de Ingeniería, Universidad de Talca, Chile
José A. Reyes

Authors

José A. Reyes
View author publications
You can also search for this author in PubMed Google Scholar
David Gilbert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Amos Bairoch Sarah Cohen-Boulakia Christine Froidevaux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reyes, J.A., Gilbert, D. (2008). Combining One-Class Classification Models Based on Diverse Biological Data for Prediction of Protein-Protein Interactions. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69828-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-69828-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69827-2
Online ISBN: 978-3-540-69828-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics