Abstract
The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou’s pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.


Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Nakai K (2000) Adv Protein Chem 54:277
Chou KC, Shen HB (2007) Anal Biochem 370:1
Chou KC (2011) J Theor Biol 273:236
Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) J Mol Biol 300:1005
Nair R, Rost B (2003) Proteins 53:917
Park KJ, Kanehisa M (2003) 19:1656
Scott MS, Thomas DY, Hallett MT (2004) Genome Res 14:1957
Bhasin M, Garg A, Raghava GP (2005) Bioinformatics 21:2522
Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS (2005) Bioinformatics 21:617
Xie D, Li A, Wang M, Fan Z, Feng H (2005) Nucleic Acids Res 33:W105
Guda C (2006) Nucleic Acids Res 34:W210
Hoglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O (2006) Bioinformatics 22:1158
Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006) Bioinformatics 22(14):E408
Yu CS, Chen YC, Lu CH, Hwang JK (2006) Proteins 64:643
Shatkay H, Hoglund A, Brady S, Blum T, Donnes P, Kohlbacher O (2007) Bioinformatics 23:1410
Chang JM, Su EC, Lo A, Chiu HS, Sung TY, Hsu WL (2008) Proteins 72(2):693
Fyshe A, Liu Y, Szafron D, Greiner R, Lu P (2008) Bioinformatics 24:2512
Garg A, Raghava GP (2008) BMC Bioinform 9:503
Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY (2008) BMC Bioinform 9:80
Nasibov E, Kandemir-Cavas C (2008) Comput Biol Chem 32:448
Chou KC, Shen HB (2008) Nat Protoc 3:153
Chou KC, Shen HB (2007) J Proteome Res 6:1728
Shen HB, Chou KC (2007) Biochem Biophys Res Commun 355:1006
Chou KC, Shen HB (2007) J Cell Biochem 100:665
Shen HB, Chou KC (2007) Protein Eng Des Sel 20:39
Chou KC, Shen HB (2006) J Proteome Res 5:3420
Shen HB, Chou KC (2007) Biopolymers 85:233
Nakashima H, Nishikawa K (1994) J Mol Biol 238:54
Chou KC, Elrod DW (1999) Protein Eng 12:107
Chou KC, Cai YD (2002) J Biol Chem 277:45765
Chou KC (2001) Proteins 43:246
Zhou GP, Doctor K (2003) Proteins 50:44
Chou KC, Wu ZC, Xiao X (2011) PLoS ONE 6:e18258
Wu ZC, Xiao X, Chou KC (2012) Protein Pept Lett 19:4
Chou KC, Wu ZC, Xiao X (2012) Mol BioSyst 8:629
Wu ZC, Xiao X, Chou KC (2011) Mol BioSyst 7:3287
Xiao X, Wu ZC, Chou KC (2011) J Theor Biol 284:42
Mei S (2012) J Theor Biol 310:80
Xiao X, Wu ZC, Chou KC (2011) PLoS ONE 6:e20592
Lee TY, Chen YJ, Lu CT, Ching WC, Teng YC, Huang HD (2012) Bioinformatics 28:2293
Lee TY, Lin ZQ, Hsieh SJ, Bretana NA, Lu CT (2011) Bioinformatics 27:1780
Lee TY, Chen YJ, Lu TC, Huang HD (2011) PLoS ONE 6:e21849
Lee TY, Bretana NA, Lu CT (2011) BMC Bioinformatics 12:261
Lee TY, Bo-Kai Hsu J, Chang WC, Huang HD (2011) Nucleic Acids Res 39:D777
Lee TY, Hsu JB, Lin FM, Chang WC, Hsu PC, Huang HD (2010) J Comput Chem 31:2759
Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK (2007) Nucleic Acids Res 35:W588
Huang HD, Lee TY, Tzeng SW, Horng JT (2005) Nucleic Acids Res 33:W226
Qiu JD, Huang JH, Shi SP, Liang RP (2010) Protein Pept Lett 17:715
Chen C, Shen ZB, Zou XY (2012) Protein Pept Lett 19:422
Gu Q, Ding YS, Zhang TL (2010) Protein Pept Lett 17:559
Li LQ, Zhang Y, Zou LY, Zhou Y, Zheng XQ (2012) Protein Pept Lett 19:375
Zia Ur R, Khan A (2012) Protein Pept Lett 19:890
Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) J Theor Biol 281:18
Zeng YH, Guo YZ, Xiao RQ, Yang L, Yu LZ, Li ML (2009) J Theor Biol 259:366
Chen C, Chen L, Zou X, Cai P (2009) Protein Pept Lett 16:27
Ding H, Luo LF, Lin H (2009) Protein Pept Lett 16:351
Zhou XB, Chen C, Li ZC, Zou XY (2007) J Theor Biol 248:546
Georgiou DN, Karakasidis TE, Nieto JJ, Torres A (2009) J Theor Biol 257:17
Yu LZ, Guo YZ, Li YZ, Li GB, Li ML, Luo JS, Xiong WJ, Qin WL (2010) J Theor Biol 267:1
Jiang XY, Wei R, Zhang TL, Gu Q (2008) Protein Pept Lett 15:392
Li FM, Li QZ (2008) Protein Pept Lett 15:612
Lin H, Ding H, Guo FB, Zhang AY, Huang J (2008) Protein Pept Lett 15:739
Zhang GY, Li HC, Gao JQ, Fang BS (2008) Protein Pept Lett 15:1132
Han L, Cui J, Lin H, Ji Z, Cao Z, Li Y, Chen Y (2006) Proteomics 6:4023
Veropoulos K, Cristianini N, Campbell C (1999) Proceedings of the international joint conference on artificial intelligence (IJCAI99), workshop ML3, p 55
Nair R, Rost B (2002) Protein Sci 11:2836
Nielsen H, Engelbrecht J, von Heijne G, Brunak S (1996) Proteins 24:165
Chou KC, Shen HB (2010) PLoS ONE 5:e9931
Chou KC, Shen HB (2010) PLoS ONE 5:e11335
UniProt C (2008) Nucleic Acids Res 36(Database issue):D190
Boeckmann B, Blatter MC, Famiglietti L, Hinz U, Lane L, Roechert B, Bairoch A (2005) C R Biol 328:882
Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) J Theor Biol 263:203
Mohabatkar H (2010) Protein Pept Lett 17:1207
Lin H (2008) J Theor Biol 252:350
Chou KC (2009) Curr Proteomics 6:262
Carrie C, Giraud E, Whelan J (2009) FEBS J 276:1187
Millar AH, Whelan J, Small I (2006) Curr Opin Plant Biol 9:610
Bannai H, Tamada Y, Maruyama O, Nakai K, Miyano S (2002) Bioinformatics 18:298
von Heijne G (1990) Curr Opin Cell Biol 2:604
Hurtley SM (1996) Protein targeting. Oxford University Press, Oxford
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25:3389
Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer ELL (2002) Nucleic Acids Res 30:276
Sigrist CJA, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P (2002) Briefings Bioinform 3:265
Nair R, Carter P, Rost B (2003) Nucleic Acids Res 31:397
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ (2006) Nucleic Acids Res 34:W32
Solito E, Christian HC, Festa M, Mulla A, Tierney T, Flower RJ, Buckingham JC (2006) Faseb J 20:1498
Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, Brunak S (2002) J Mol Biol 319:1257
Mizushima S (1984) Mol Cell Biochem 60:5
Eichler J (2001) Eur J Biochem 268:4366
Pal-Bhowmick I, Vora HK, Jarori GK (2007) Malar J 6:45
Kiemer L, Bendtsen JD, Blom N (2005) Bioinformatics 21(7):1269
Shien DM, Lee TY, Chang WC, Hsu JB, Horng JT, Hsu PC, Wang TY, Huang HD (2009) J Comput Chem 30(9):1532
Gupta R, Jung E, Brunak S (2004) [online] Available http://www.cbs.dtu.dk/services/NetNGlyc/
Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S (1998) Glycoconj J 15:115
Blom N, Gammeltoft S, Brunak S (1999) J Mol Biol 294:1351
Chang WC, Lee TY, Shien DM, Hsu JB, Horng JT, Hsu PC, Wang TY, Huang HD, Pan RL (2009) J Comput Chem 30(15):2526
Eddy SR (1998) Bioinformatics 14:755
Chang CC, Lin CJ (2001) Software available at http://www. csie. ntu. edu. tw/cjlin/libsvm 80:604
Zakeri P, Moshiri B, Sadeghi M (2011) J Theor Biol 269:208
Nanni L, Lumini A, Gupta D, Garg A (2011) IEEE/ACM Trans Comput Biol Bioinform 9(2):467
Jiawei Han MK (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) Genome Res 14:1188
Schneider TD, Stephens RM (1990) Nucleic Acids Res 18:6097
Cokol M, Nair R, Rost B (2000) EMBO Rep 1:411
Schaecher SR, Diamond MS, Pekosz A (2008) J Virol 82:9477
Ladd AN, Cooper TA (2004) J Cell Sci 117:3519
Hirata T, Okabe M, Kobayashi A, Ueda K, Matsuo M (2009) Biosci Biotechnol Biochem 73(3):619
Eisenhaber B, Eisenhaber F (2007) Curr Protein Pept Sci 8:197
Lee TY, Huang HD, Hung JH, Huang HY, Yang YS, Wang TH (2006) Nucleic Acids Res 34:D622
Acknowledgments
The authors would like to thank the National Science Council of the Republic of China, No. NSC 101-2628-E-155-002-MY2, 99-2221-E-008-083-MY3, NSC 101-2311-B-009-003-MY3 and NSC 100-2627-B-009-002. This work was supported in part by the UST-UCSD International Center of Excellence in Advanced Bioengineering sponsored by the Taiwan National Science Council I-RiCE Program under Grant Number: NSC 101-2911-I-009-101, and Veterans General Hospitals and University System of Taiwan (VGHUST) Joint Research Program under Grant Number: VGHUST101-G5-1-1. This work was also partially supported by MOE ATU.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chang, TH., Wu, LC., Lee, TY. et al. EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s PseAAC. J Comput Aided Mol Des 27, 91–103 (2013). https://doi.org/10.1007/s10822-012-9628-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-012-9628-0