Abstract
Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylation site data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of k-nearest neighbor and support vector machine for predicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Steen, H., Jebanathirajah, J.A., Rush, J., Morrice, N., Kirschner, M.W.: Phosphorylation analysis by mass spectrometry: myths, facts, and the consequences for qualitative and quantitative measurements. Mol. Cell Proteomics 5(1), 172–181 (2006)
Olsen, J.V., Blagoev, B., Gnad, F., Macek, B., Kumar, C., Mortensen, P., Mann, M.: Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006)
Villén, J., Beausoleil, S.A., Gerber, S.A., Gygi, S.P.: Large-scale phosphorylation analysis of mouse liver. Proc. Natl. Acad. Sci. USA 104, 1488–1493 (2007)
Chi, A., Huttenhower, C., Geer, L.Y., Coon, J.J., Syka, J.E., Bai, D.L., Shabanowitz, J., Burke, D.J., Troyanskaya, O.G., Hunt, D.F.: Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proc. Natl. Acad. Sci. USA 104, 2193–2198 (2007)
Benschop, J.J., Mohammed, S., O’Flaherty, M., Heck, A.J., Slijper, M., Menke, F.L.: Quantitative Phosphoproteomics of Early Elicitor Signaling in Arabidopsis. Mol Cell Proteomics 6, 1198–1214 (2007)
Sugiyama, N., Nakagami, H., Mochida, K., Daudi, A., Tomita, M., Shirasu, K., Ishihama, Y.: Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis. Mol. Syst. Biol. 4, 193 (2008)
Diella, F., Gould, C.M., Chica, C., Via, A., Gibson, T.J.: Phospho.ELM: a database of phosphorylation sites–update 2008. Nucleic Acids Res. 36(Database issue), D240–D244 (2008)
Gnad, F., Ren, S., Cox, J., Olsen, J.V., Macek, B., Oroshi, M., Mann, M.: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 8, R250 (2007)
Tchieu, J.H., Fana, F., Fink, J.L., Harper, J., Nair, T.M., Niedner, R.H., Smith, D.W., Steube, K., Tam, T.M., Veretnik, S., Wang, D., Gribskov, M.: The PlantsP and PlantsT Functional Genomics Databases. Nucleic Acids Res. 31, 342–344 (2003)
Heazlewood, J.L., Durek, P., Hummel, J., Selbig, J., Weckwerth, W., Walther, D., Schulze, W.X.: PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Res. 36(Database issue), D1015–D1021 (2008)
Gao, J., Agrawal, G.K., Thelen, J.J., Xu, D.: P3DB: a plant protein phosphorylation database. Nucleic Acids Res. 37(Database issue), D960–D962 (2009)
Obenauer, J.C., Cantley, L.C., Yaffe, M.B.: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 31(13), 3635–3641 (2003)
Blom, N., Sicheritz-Ponten, T., Gupta, R., Gammeltoft, S., Brunak, S.: Proteomics. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence 4(6), 1633–1649 (2004)
Kim, J.H., Lee, J., Oh, B., Kimm, K., Koh, I.: Prediction of phosphorylation sites using SVMs. Bioinformatics 20(17), 3179–3184 (2004)
Iakoucheva, L.M., Radivojac, P., Brown, C.J., O’Connor, T.R., Sikes, J.G., Obradovic, Z., Dunker, A.K.: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 32(3), 1037–1049 (2004)
Huang, H.D., Lee, T.Y., Tzeng, S.W., Horng, J.T.: KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 33(Web Server issue), W226–W229 (2005)
Xue, Y., Li, A., Wang, L., Feng, H., Yao, X.: PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics 7, 163 (2006)
Neuberger, G., Schneider, G., Eisenhaber, F.: pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase substrate binding model. Biol. Direct. 2, 1 (2007)
Saunders, N.F., Kobe, B.: The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information. Nucleic Acids Res. 36(Web Server issue), W286–W290 (2008)
Xue, Y., Ren, J., Gao, X., Jin, C., Wen, L., Yao, X.: GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol. Cell Proteomics 7(9), 1598–1608 (2008)
Plewczynski, D., Tkacz, A., Wyrwicz, L.S., Rychlewski, L., Ginalski, K.: AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update. J. Mol. Model 14(1), 69–76 (2008)
Dang, T.H., Van Leemput, K., Verschoren, A., Laukens, K.: Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics 24(24), 2857–2864 (2008)
Swarbreck, D., Wilks, C., Lamesch, P., Berardini, T.Z., Garcia-Hernandez, M., Foerster, H., Li, D., Meyer, T., Muller, R., Ploetz, L., Radenbaugh, A., Singh, S., Swing, V., Tissier, C., Zhang, P., Huala, E.: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 36(Database issue), D1009–D1014 (2008)
Kennelly, P.J., Krebs, E.G.: Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases. J. Biol. Chem. 266, 15555–15558 (1991)
Henikoff, S.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad Sci. USA 89, 10915–10919 (1992)
Dunker, A.K., Oldfield, C.J., Meng, J., Romero, P., Yang, J.Y., Chen, J.W., Vacic, V., Obradovic, Z., Uversky, V.N.: The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics 9(Suppl. 2), S1 (2008)
Obradovic, Z., Peng, K., Vucetic, S., Radivojac, P., Dunker, A.K.: Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61(suppl. 7), 176–182 (2005)
Joachims, T.: SVMlight Version 6.0.2 (2008), http://svmlight.joachims.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, J., Agrawal, G.K., Thelen, J.J., Obradovic, Z., Dunker, A.K., Xu, D. (2009). A New Machine Learning Approach for Protein Phosphorylation Site Prediction in Plants. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-00727-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00726-2
Online ISBN: 978-3-642-00727-9
eBook Packages: Computer ScienceComputer Science (R0)