Skip to main content

Improving Protein Localization Prediction Using Amino Acid Group Based Physichemical Encoding

  • Conference paper
Bioinformatics and Computational Biology (BICoB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Included in the following conference series:

Abstract

Computational prediction of protein localization is one common way to characterize the functions of newly sequenced proteins. Sequence features such as amino acid (AA) composition have been widely used for subcellular localization prediction due to their simplicity while suffering from low coverage and low prediction accuracy. We present a physichemical encoding method that maps protein sequences into feature vectors composed of the locations and lengths of amino acid groups (AAGs) with similar physichemical properties. This high-level modular representation of protein sequences overcomes the shortcoming of losing order information in the commonly used AA composition and AA pair composition encoding. When applied with SVM classifiers, we showed that AAG based features are able to achieve higher prediction accuracy (up to 20% improvement) than the widely used AA composition and AA pair composition to differentiate proteins of different localizations. When AAGs and AA composition encoding combined, the prediction accuracy can be further improved thus achieving synergistic effect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K.: Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003)

    Article  CAS  PubMed  Google Scholar 

  2. Kumar, A., Agarwal, S., Heyman, J.A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., et al.: Subcellular localization of the yeast proteome. Genes Dev. 16, 707–719 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Casadio, R., Martelli, P.L., Pierleoni, A.: The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief Funct. Genomic. Proteomic. 7, 63–73 (2008)

    Article  CAS  PubMed  Google Scholar 

  4. Emanuelsson, O., Brunak, S., von Heijne, G., Nielsen, H.: Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols 2, 953–971 (2007)

    Article  CAS  PubMed  Google Scholar 

  5. Gardy, J.L., Brinkman, F.S.L.: Methods for predicting bacterial protein subcellular localization. Nature Reviews Microbiology 4, 741–751 (2006)

    Article  CAS  PubMed  Google Scholar 

  6. Sprenger, J., Fink, J.L., Teasdale, R.D.: Evaluation and comparison of mammalian subcellular localization prediction methods. Bmc Bioinformatics 7 (2006)

    Google Scholar 

  7. Shen, H.B., Yang, J., Chou, K.C.: Methodology development for predicting subcellular localization and other attributes of proteins. Expert Review of Proteomics 4, 453–463 (2007)

    Article  CAS  PubMed  Google Scholar 

  8. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300, 1005–1016 (2000)

    Article  CAS  PubMed  Google Scholar 

  9. Scott, M.S., Calafell, S.J., Thomas, D.Y., Hallett, M.T.: Refining protein subcellular localization. PLoS Comput. Biol. 1, 518–528 (2005)

    Article  CAS  Google Scholar 

  10. Jin, Y.H., Niu, B., Feng, K.Y., Lu, W.C., Cai, Y.D., Li, G.Z.: Predicting subcellular localization with AdaBoost Learner. Protein and Peptide Letters 15, 286–289 (2008)

    Article  CAS  PubMed  Google Scholar 

  11. Lorena, A.C., de Carvalho, A.C.P.L.: Protein cellular localization prediction with support vector machines and decision trees. Computers in Biology and Medicine 37, 115–125 (2007)

    Article  CAS  PubMed  Google Scholar 

  12. Sarda, D., Chua, G.H., Li, K.B., Krishnan, A.: pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. Bmc Bioinformatics 6 (2005)

    Google Scholar 

  13. Hua, S.J., Sun, Z.R.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)

    Article  CAS  PubMed  Google Scholar 

  14. Nakai, K., Horton, P.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Sciences 24, 34–35 (1999)

    Article  CAS  PubMed  Google Scholar 

  15. Chou, K.C., Cai, Y.D.: Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. Journal of Cellular Biochemistry 91, 1197–1203 (2004)

    Article  CAS  PubMed  Google Scholar 

  16. Nanni, L., Lumini, A.: Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34, 653–660 (2008)

    Article  CAS  PubMed  Google Scholar 

  17. Li, Y.F., Liu, J.: Predicting subcellular localization of proteins using support vector machine with N-terminal amino composition. Proceedings of Advanced Data Mining and Applications 3584, 618–625 (2005)

    Article  Google Scholar 

  18. Shi, J.Y., Zhang, S.W., Pan, Q., Cheng, Y.M., Xie, J.: Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33, 69–74 (2007)

    Article  CAS  PubMed  Google Scholar 

  19. Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 13, 1402–1406 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Szafron, D., Lu, P., Greiner, R., Wishart, D.S., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A., et al.: Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Research 32, W365–W371 (2004)

    Article  Google Scholar 

  21. Marcotte, E.M., Xenarios, I., van der Bliek, A.M., Eisenberg, D.: Localizing proteins in the cell from their phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America 97, 12115–12120 (2000)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yu, C.S., Chen, Y.C., Lu, C.H., Hwang, J.K.: Prediction of protein subcellular localization. Proteins-Structure Function and Bioinformatics 64, 643–651 (2006)

    Article  CAS  Google Scholar 

  23. Zhang, S., Xia, X.F., Shen, J.C., Sun, Z.R.: Eukaryotic protein subcellular localization prediction based on sequence conservation and protein-protein interaction. Progress in Biochemistry and Biophysics 35, 531–535 (2008)

    Google Scholar 

  24. Drawid, A., Gerstein, M.: A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome. Journal of Molecular Biology 301, 1059–1075 (2000)

    Article  CAS  PubMed  Google Scholar 

  25. Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Research 36, D202–D205 (2008)

    Article  Google Scholar 

  26. Silhavy, T.J., Benson, S.A., Emr, S.D.: Mechanisms of Protein Localization. Microbiological Reviews 47, 313–344 (1983)

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Ng, S.Y.M., Chaban, B., VanDyke, D.J., Jarrell, K.F.: Archaeal signal peptidases. Microbiology-Sgm 153, 305–314 (2007)

    Article  CAS  Google Scholar 

  28. Nielsen, H., Brunak, S., von Heijne, G.: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Engineering 12, 3–9 (1999)

    Article  CAS  PubMed  Google Scholar 

  29. Emanuelsson, O.: Predicting protein subcellular localisation from amino acid sequence information. Brief Bioinform. 3, 361–376 (2002)

    Article  CAS  PubMed  Google Scholar 

  30. Li, Z.R., Lin, H.H., Han, L.Y., Jiang, L., Chen, X., Chen, Y.Z.: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Research 34, W32–W37 (2006)

    Article  Google Scholar 

  31. Biro, J.C.: Amino acid size, charge, hydropathy indices and matrices for protein structure analysis. Theor. Biol Med. Model. 3, 15 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Lu, Y., Bulka, B., Desjardins, M., Freeland, S.J.: Amino acid quantitative structure property relationship database: a web-based platform for quantitative investigations of amino acids. Protein Engineering Design & Selection 20, 347–351 (2007)

    Article  CAS  Google Scholar 

  33. Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. Journal of Molecular Biology 348, 85–100 (2005)

    Article  CAS  PubMed  Google Scholar 

  34. Pierleoni, A., Martelli, P.L., Fariselli, P., Casadio, R.: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22, E408–E416 (2006)

    Article  Google Scholar 

  35. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  36. Casadio, R., Martelli, P.L., Pierleoni, A.: The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief Funct. Genomic. Proteomic. 7, 63–73 (2008)

    Article  CAS  PubMed  Google Scholar 

  37. Pierleoni, A., Martelli, P.L., Fariselli, P., Casadio, R.: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22, E408–E416 (2006)

    Article  Google Scholar 

  38. Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceeding of International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858. Springer, Heidelberg (1997)

    Google Scholar 

  39. Xiao, X., Chou, K.C.: Digital coding of amino acids based on hydrophobic index. Protein and Peptide Letters 14, 871–875 (2007)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, J., Zhang, F. (2009). Improving Protein Localization Prediction Using Amino Acid Group Based Physichemical Encoding. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00727-9_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00726-2

  • Online ISBN: 978-3-642-00727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics