Abstract
Extracellular plant proteins are involved in numerous pro- cesses including nutrient acquisition, communication with other soil organisms, protection from pathogens, and resistance to disease and toxic metals. Insofar as these proteins are strategically positioned to play a role in resistance to environmental stress, biologists are interested in proteomic tools in analyzing extracellular proteins. In this paper, we present three methods using frequent subsequences of amino acids: one based on support vector machines (SVM), one based on boosting and FSP, a new frequent subsequence pattern method. We test our methods on a plant dataset and the experimental results show that our methods perform better than the existing approaches based on amino acid composition.
Research funded in part by the Alberta Ingenuity Funds and NSERC Canada.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Antonie, M.-L., Zaïane, O.R., Coman, A.: Chapter Associative Classifiers for Medical Images. In: MDM/KDD 2002 and KDMCD 2002. LNCS, vol. 2797, pp. 68–83. Springer, Heidelberg (2003)
Bhasin, M., Raghava, G.: Eslpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and psi-blast. Nucleic Acids Research 32, W414–W419 (2004)
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31, 365–370 (2003)
Cohen, W., Singer, Y.: A simple, fast and effective rule learner. In: Proceedings of Annual Conference of American Association for Artificial Intelligence, pp. 335–342 (1999)
Eisenhaber, F., Bork, P.: Wanted: subcellular localization of proteins based on sequence. Trends in Cell Biology 8, 169–170 (1998)
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300, 1005–1016 (2000)
Frenkel, K.A.: The human genome project and informatics. Communications of the ACM 34(11), 41–51 (1991)
Garg, A., Bhasin, M., Raghava, G.: Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. Journal of Biological Chemistry 280(15), 14427–14432 (2005)
Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge (1997)
Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)
Hunter, L.: Artificial Intelligence and Molecular Biology. AAAI Press, Menlo Park (1993)
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)
Joshi, M.V., Agarwal, R.C., Kumar, V.: Mining needles in a haystack: Classifying rare classes via two-phase rule induction. In: Proceedings of ACM SIGMOD Conference, Santa Barbara, CA, pp. 91–102 (2001)
Lu, Z.: Predicting protein sub-cellular localization from homologs using machine learning algorithms. Master thesis, Department of Computing Science, University of Alberta (2002)
Nair, R., Rost, B.: Inferring sub-cellular localization through automatic lexical analysis. In: Proceedings of the tenth International Conference on Intelligent Syetems for Molecular Biology, pp. 78–86. Oxford University Press, Oxford (2002)
Nakai, K.: A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992)
Nielsen, H., Engelbrecht, J., Brunak, S.: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems 8, 581–599 (1997)
Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10(1), 1–6 (1997)
Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research 26(9), 2230–2236 (1998)
Schapire, R., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)
Schapire, R., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39(2), 135–168 (2000)
She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: Proceedings of ACM SIGKDD Conference, Washington, DC, USA (2003)
Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proceedings of Intl. Conference on Machine Learning, pp. 983–990 (2000)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Wang, J., Chirn, G., Marr, T., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: Some preliminary results. In: Proceedings of ACM SIGMOD Conference, Minnesota, USA (1994)
Wang, Y.: EPPdb: A database for proteomic analysis of extracytosolic plant proteins. Master thesis, Department of Computing Science, University of Alberta (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zaïane, O.R., Wang, Y., Goebel, R., Taylor, G. (2006). Frequent Subsequence-Based Protein Localization. In: Li, J., Yang, Q., Tan, AH. (eds) Data Mining for Biomedical Applications. BioDM 2006. Lecture Notes in Computer Science(), vol 3916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691730_5
Download citation
DOI: https://doi.org/10.1007/11691730_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33104-9
Online ISBN: 978-3-540-33105-6
eBook Packages: Computer ScienceComputer Science (R0)