Skip to main content

Frequent Subsequence-Based Protein Localization

  • Conference paper
Data Mining for Biomedical Applications (BioDM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3916))

Included in the following conference series:

  • 1042 Accesses

Abstract

Extracellular plant proteins are involved in numerous pro- cesses including nutrient acquisition, communication with other soil organisms, protection from pathogens, and resistance to disease and toxic metals. Insofar as these proteins are strategically positioned to play a role in resistance to environmental stress, biologists are interested in proteomic tools in analyzing extracellular proteins. In this paper, we present three methods using frequent subsequences of amino acids: one based on support vector machines (SVM), one based on boosting and FSP, a new frequent subsequence pattern method. We test our methods on a plant dataset and the experimental results show that our methods perform better than the existing approaches based on amino acid composition.

Research funded in part by the Alberta Ingenuity Funds and NSERC Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Antonie, M.-L., Zaïane, O.R., Coman, A.: Chapter Associative Classifiers for Medical Images. In: MDM/KDD 2002 and KDMCD 2002. LNCS, vol. 2797, pp. 68–83. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Bhasin, M., Raghava, G.: Eslpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and psi-blast. Nucleic Acids Research 32, W414–W419 (2004)

    Article  Google Scholar 

  3. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31, 365–370 (2003)

    Article  Google Scholar 

  4. Cohen, W., Singer, Y.: A simple, fast and effective rule learner. In: Proceedings of Annual Conference of American Association for Artificial Intelligence, pp. 335–342 (1999)

    Google Scholar 

  5. Eisenhaber, F., Bork, P.: Wanted: subcellular localization of proteins based on sequence. Trends in Cell Biology 8, 169–170 (1998)

    Article  Google Scholar 

  6. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300, 1005–1016 (2000)

    Article  Google Scholar 

  7. Frenkel, K.A.: The human genome project and informatics. Communications of the ACM 34(11), 41–51 (1991)

    Article  Google Scholar 

  8. Garg, A., Bhasin, M., Raghava, G.: Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. Journal of Biological Chemistry 280(15), 14427–14432 (2005)

    Article  Google Scholar 

  9. Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  10. Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)

    Article  Google Scholar 

  11. Hunter, L.: Artificial Intelligence and Molecular Biology. AAAI Press, Menlo Park (1993)

    Google Scholar 

  12. Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)

    Book  Google Scholar 

  13. Joshi, M.V., Agarwal, R.C., Kumar, V.: Mining needles in a haystack: Classifying rare classes via two-phase rule induction. In: Proceedings of ACM SIGMOD Conference, Santa Barbara, CA, pp. 91–102 (2001)

    Google Scholar 

  14. Lu, Z.: Predicting protein sub-cellular localization from homologs using machine learning algorithms. Master thesis, Department of Computing Science, University of Alberta (2002)

    Google Scholar 

  15. Nair, R., Rost, B.: Inferring sub-cellular localization through automatic lexical analysis. In: Proceedings of the tenth International Conference on Intelligent Syetems for Molecular Biology, pp. 78–86. Oxford University Press, Oxford (2002)

    Google Scholar 

  16. Nakai, K.: A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992)

    Article  Google Scholar 

  17. Nielsen, H., Engelbrecht, J., Brunak, S.: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems 8, 581–599 (1997)

    Article  Google Scholar 

  18. Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10(1), 1–6 (1997)

    Article  Google Scholar 

  19. Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research 26(9), 2230–2236 (1998)

    Article  Google Scholar 

  20. Schapire, R., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  21. Schapire, R., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39(2), 135–168 (2000)

    Article  MATH  Google Scholar 

  22. She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: Proceedings of ACM SIGKDD Conference, Washington, DC, USA (2003)

    Google Scholar 

  23. Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proceedings of Intl. Conference on Machine Learning, pp. 983–990 (2000)

    Google Scholar 

  24. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  25. Wang, J., Chirn, G., Marr, T., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: Some preliminary results. In: Proceedings of ACM SIGMOD Conference, Minnesota, USA (1994)

    Google Scholar 

  26. Wang, Y.: EPPdb: A database for proteomic analysis of extracytosolic plant proteins. Master thesis, Department of Computing Science, University of Alberta (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zaïane, O.R., Wang, Y., Goebel, R., Taylor, G. (2006). Frequent Subsequence-Based Protein Localization. In: Li, J., Yang, Q., Tan, AH. (eds) Data Mining for Biomedical Applications. BioDM 2006. Lecture Notes in Computer Science(), vol 3916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691730_5

Download citation

  • DOI: https://doi.org/10.1007/11691730_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33104-9

  • Online ISBN: 978-3-540-33105-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics