Abstract
Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this paper we present a new method that integrates sequence, motif and protein interaction data to model how proteins are sorted through these targeting pathways. We use a hidden Markov model (HMM) to represent protein targeting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms.
Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Su, L.: The Universal Protein Resource (UniProt). Nucleic Acids Res. 33(Database issue), D154–D159 (2005), http://dx.doi.org/10.1093/nar/gki070
Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., Miyano, S.: Extensive feature detection of n-terminal protein sorting signals. Bioinformatics 18(2), 298–305 (2002)
Barbe, L., Lundberg, E., Oksvold, P., Stenius, A., Lewin, E., Björling, E., Asplund, A., Pontén, F., Brismar, H., Uhlén, M., Svahn, H.A.: Toward a confocal subcellular atlas of the human proteome. Mol. Cell Proteomics 7(3), 499–508 (2008), http://dx.doi.org/10.1074/mcp.M700325-MCP200
Bendtsen, J.D., Jensen, L.J., Blom, N., Von Heijne, G., Brunak, S.: Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 17(4), 349–356 (2004), http://view.ncbi.nlm.nih.gov/pubmed/15115854
Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S.: Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340(4), 783–795 (2004), http://dx.doi.org/10.1016/j.jmb.2004.05.028
Chen, S.C., Zhao, T., Gordon, G.J., Murphy, R.F.: Automated image analysis of protein localization in budding yeast. Bioinformatics 23(13), i66–i71 (2007), http://dx.doi.org/10.1093/bioinformatics/btm206
Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S., Botstein, D.: SGD: Saccharomyces genome database. Nucleic Acids Research 26(1), 73–79 (1998), http://dx.doi.org/10.1093/nar/26.1.73
Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., Cohen, L., Danon, T., Perzov, N., Alon, U.: Dynamic proteomics of individual cancer cells in response to a drug. Science 322(5907), 1511–1516 (2008), http://dx.doi.org/10.1126/science.1160165
De Strooper, B., Beullens, M., Contreras, B., Levesque, L., Craessaerts, K., Cordell, B., Moechars, D., Bollen, M., Fraser, P., St. George-Hyslop, P., Van Leuven, F.: Phosphorylation, subcellular localization, and membrane orientation of the Alzheimer’s disease-associated presenilins. Journal of Biological Chemistry 272(6), 3590–3598 (1997), http://dx.doi.org/10.1074/jbc.272.6.3590
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977), http://dx.doi.org/10.2307/2984875 , doi:10.2307/2984875
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000), http://dx.doi.org/10.1006/jmbi.2000.3903
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Analysis & Applications 13(1), 113–129 (2010), http://dx.doi.org/10.1007/s10044-008-0141-y
Gladden, A.B., Diehl, A.A.: Location, location, location: the role of cyclin D1 nuclear localization in cancer. Journal of cellular biochemistry 96(5), 906–913 (2005), http://dx.doi.org/10.1002/jcb.20613
Horton, P., Park, K.J., Obayashi, T., Fujita, N., Harada, H., Collier, C.J.A., Nakai, K.: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35(Web Server issue), W585–W587 (2007), http://dx.doi.org/10.1093/nar/gkm259
Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K.: Global analysis of protein localization in budding yeast. Nature 425(6959), 686–691 (2003), http://dx.doi.org/10.1038/nature02026
Kau, T.R., Way, J.C., Silver, P.A.: Nuclear transport and cancer: from mechanism to intervention. Nat. Rev. Cancer 4(2), 106–117 (2004), http://dx.doi.org/10.1038/nrc1274
Lee, K., Chuang, H.Y., Beyer, A., Sung, M.K., Huh, W.K., Lee, B., Ideker, T.: Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Research 36(20), e136+ (2008), http://dx.doi.org/10.1093/nar/gkn619
Lin, T.H., Murphy, R.F., Bar-Joseph, Z.: Discriminative motif finding for predicting protein subcellular localization. IEEE/ACM Trans. Comput. Biol. Bioinform. (2009) (to appear)
Lodish, H.F.: Molecular cell biology, 5threv. edn. W.H. Freeman and Company, New York (August 2003), http://www.worldcat.org/isbn/0716743663
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R.R., Courcelle, E., Das, U., Durbin, R., Falquet, L., Fleischmann, W., Jones, S.G., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Lonsdale, D., Silventoinen, V., Orchard, S.E., Pagni, M., Peyruc, D., Ponting, C.P., Selengut, J.D., Servant, F., Sigrist, C.J.A., Vaughan, R., Zdobnov, E.M.: The InterPro database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31(1), 315–318 (2003)
Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol. 348(1), 85–100 (2005), http://dx.doi.org/10.1016/j.jmb.2005.02.025
Newberg, J.Y., Li, J., Rao, A., Pontén, F., Uhlén, M., Lundberg, E., Murphy, R.F.: Automated analysis of human protein atlas immunofluorescence images. In: Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging, pp. 1023–1026 (2009)
Osuna, E.G., Hua, J., Bateman, N.W., Zhao, T., Berget, P.B., Murphy, R.F.: Large-scale automated analysis of location patterns in randomly tagged 3T3 cells. Ann. Biomed. Eng. 35(6), 1081–1087 (2007), http://dx.doi.org/10.1007/s10439-007-9254-5
Pierleoni, A., Martelli, P.L., Fariselli, P., Casadio, R.: Bacello: a balanced subcellular localization predictor. Bioinformatics 22 (2006), http://view.ncbi.nlm.nih.gov/pubmed/16873501
Purdue, P.E., Takada, Y., Danpure, C.J.: Identification of mutations associated with peroxisome-to-mitochondrion mistargeting of alanine/glyoxylate aminotransferase in primary hyperoxaluria type 1. J. Cell Biol. 111(6), 2341–2351 (1990), http://dx.doi.org/10.1083/jcb.111.6.2341
Rashid, M., Saha, S., Raghava, G.P.: Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 8, 337 (2007), http://dx.doi.org/10.1186/1471-2105-8-337
Rubartelli, A., Sitia, R.: Secretion of mammalian proteins that lack a signal sequence. In: Unusual Secretory Pathways: From Bacteria to Man, pp. 87–104. RG Landes, Austin (1997)
Scott, M.S., Calafell, S.J., Thomas, D.Y., Hallett, M.T.: Refining protein subcellular localization. PLoS Comput. Biol. 1(6) (November 2005), http://dx.doi.org/10.1371/journal.pcbi.0010066
Shatkay, H., Höglund, A., Brady, S., Blum, T., Dönnes, P., Kohlbacher, O.: SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11), 1410–1417 (2007), http://dx.doi.org/10.1093/bioinformatics/btm115
Shen, Y.Q., Burger, G.: ’unite and conquer’: enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics 8, 420+ (2007), http://dx.doi.org/10.1186/1471-2105-8-420
Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454–e463 (2006), http://dx.doi.org/10.1093/bioinformatics/btl227
Skach, W.R.: Defects in processing and trafficking of the cystic fibrosis transmembrane conductance regulator. Kidney International 57(3), 825–831 (2000), http://dx.doi.org/10.1046/j.1523-1755.2000.00921.x
Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34(suppl 1), D535–D539 (2006), http://dx.doi.org/10.1093/nar/gkj109
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, Th., Bar-Joseph, Z., Murphy, R.F. (2011). Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-20036-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20035-9
Online ISBN: 978-3-642-20036-6
eBook Packages: Computer ScienceComputer Science (R0)