Skip to main content
Log in

On using physico-chemical properties of amino acids in string kernels for protein classification via support vector machines

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

String kernels are popular tools for analyzing protein sequence data and they have been successfully applied to many computational biology problems. The traditional string kernels assume that different substrings are independent. However, substrings can be highly correlated due to their substructure relationship or common physico-chemical properties. This paper proposes two kinds of weighted spectrum kernels: The correlation spectrum kernel and the AA spectrum kernel. We evaluate their performances by predicting glycan-binding proteins of 12 glycans. The results show that the correlation spectrum kernel and the AA spectrum kernel perform significantly better than the spectrum kernel for nearly all the 12 glycans. By comparing the predictive power of AA spectrum kernels constructed by different physico-chemical properties, the authors can also identify the physicochemical properties which contributes the most to the glycan-protein binding. The results indicate that physico-chemical properties of amino acids in proteins play an important role in the mechanism of glycan-protein binding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Leslie C, Eskin E, and Noble W S, The spectrum kernel: A string kernel for svm protein classification, Proceedings of the Pacific Biocomputing Symposium, 2002, 7: 566–575.

    Google Scholar 

  2. Leslie C, Eskin E, Weston J, and Noble W S, Mismatch string kernels for discriminative protein classification, Bioinformatics, 2003, 20(4): 467–476.

    Article  Google Scholar 

  3. Rätsch G, Sonnenburg S, Srinivasan J, Witte H, Müller K, Sommer R, and Schölkopf B, Improving the caenorhabditis elegans genome annotation using machine learning, PLoS Computational Biology, 2007, 3: e20.

    Article  Google Scholar 

  4. Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong C, Philips P, Bona F, Hartmann L, Bohlen A, Krger N, Sonnenburg S, and Ratsch G, Mgene: Accurate svm-based gene finding with an application to nematode genomes, Genome Res., 2009, 19(11): 2133–2143.

    Article  Google Scholar 

  5. Schultheiss S, Busch W, Lohmann J, Kohlbacher O, and Rätsch G, Kirmes: Kernel-based identification of regulatory modules in euchromatic sequences, Bioinformatics, 2009, 25(16): 2126–2133.

    Article  Google Scholar 

  6. Roth V and Fischer B, Improved functional prediction of proteins by learning kernel combinations in multilabel settings, BMC Bioinformatics, 2007, 8(Supp 2): S12.

    Article  Google Scholar 

  7. Ong C and Zien A, An automated combination of kernels for predicting protein subcellular localization, Proceedings of the 8th Workshop on Algorithms in Bioinformatics (WABI), Lecture Notes in Bioinformatics, Springer, 2008, 168–179.

    Google Scholar 

  8. Röttig M, Rausch C, and Kohlbacher O, Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families, PLoS Computational Biology, 2010, 6: e1000636.

    Article  Google Scholar 

  9. Someya S, Kakuta M, Morita M, Sumikoshi K, Cao W, Ge Z, Hirose O, Nakamura S, Terada T, and Shimizu K, Prediction of carbohydrate-binding proteins from sequences using support vector machines, Advances in Bioinformatics, 2010, 1, DOI: 10.1155/2010/289301.

  10. Jin Y T B and Zhang Y, Support vector machines with genetic fuzzy feature transformation for biomedical data classification, Information Sciences, 2007, 476–489.

    Google Scholar 

  11. Vapnik V N, The Nature of Statistical Learning Theory, Springer, New York, 1995.

    Book  MATH  Google Scholar 

  12. Noble W, What is a support vector machine?, Nat Biotech, 2006, 24(12): 1565–1567.

    Article  MathSciNet  Google Scholar 

  13. Li L, Ching W, Chan Y, and Mamitsuka H, On network-based kernel methods for protein-protein interactions with applications in protein functions prediction, Journal of Systems Science and Complexity, 2010, 23(4): 917–930.

    Article  MATH  MathSciNet  Google Scholar 

  14. Argos J R A and Hargrave P, Structural prediction of membrane-bound proteins, International Journal of Peptide and Protein Research, 1982, 128: 565–575.

    Google Scholar 

  15. Toussaint N C, Widmer C, Kohlbacher O, and Rätsch G, Exploiting physico-chemical properties in string kernels, BMC Bioinformatics, 2010, 11(Suppl 8): S7.

    Article  Google Scholar 

  16. Jiang H, Ching W, and Zheng Z, Kernel techniques in support vector machines for classification of biological data, International Journal of Information Technology and Computer Science, 2011, 2: 1–8.

    Article  Google Scholar 

  17. Vapnik V and Chervonenkis A, Theory of Pattern Recognition [in Russian], Nauka, Moscow, 1974, (German Translation: Wapnik W and Tscherwonenkis A), Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979.

    Google Scholar 

  18. Schölkopf B and Smola A J, Learning with Kernels, MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  19. Schölkopf B, Tsuda K, and Vert J P, Kernel Methods in Computational Biology, MIT Press, Cambridge, Massachusetts, 2004.

    Google Scholar 

  20. Cortes C and Vapnik V, Support vector networks, Machine Learning, 1995, 20: 273–297.

    MATH  Google Scholar 

  21. Kuhn H W and Tucker A W, Nonlinear programming, Proc. 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, University of California Press, Berkeley, 1951, 481–492.

    Google Scholar 

  22. Varki A, Cummings R, Esko J, Freeze H, Hart G, and Etzler M E, Essentials of Glycobiology, 2nd Edition, Cold Spring Harbor Laboratory Press, New York, 2008.

    Google Scholar 

  23. Feizi T, Fazio F, Chai W, and Wong C, Carbohydrate microarrays — A new set of technologies at the frontiers of glycomics, Curr. Opin. Struct. Biol., 2003, 13: 637–645.

    Article  Google Scholar 

  24. Paulson J C, Blixt O, and Collins B E, Sweet spots in functional glycomics, Nat. Chem. Biol., 2006, 2: 238–248.

    Article  Google Scholar 

  25. Oyelaran O and Gildersleeve J C, Glycan arrays: Recent advances and future challenges, Curr. Opin. Chem. Biol., 2009, 13: 406–413.

    Article  Google Scholar 

  26. Kawashima S and Kanehisa M, Aaindex: Amino acid index database, Nucleic Acids Res., 2000, 28: 374.

    Article  Google Scholar 

  27. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita K, Itoh M, Kawashima S, Katayama T, Araki M, and Hirakawa M, From genomics to chemical genomics: New developments in kegg, Nucleic Acids Res., 2006, 34: 354–357.

    Article  Google Scholar 

  28. Chang C C and Lin C J, Libsvm: A library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  29. Hisamatsu K, Tsuda N, Goda S, and Hatakeyama T, Characterization of the alpha-helix region in domain 3 of the haemolytic lectin cel-iii: Implications for self-oligomerization and haemolytic processes, J. Biochem., 2008, 143: 79–86.

    Article  Google Scholar 

  30. Chandra N R, Prabu M M, Suguna K, and Vijayan M, Structural similarity and functional diversity in proteins containing the legume lectin fold, Protein Engineering, 2001, 14: 857–866.

    Article  Google Scholar 

  31. Hamelryck T W, Loris R, Bouckaert J, and Wyns L, Structural features of the legume lectins, Trends in Glycoscience and Glycotechnology, 1998, 10: 349–360.

    Article  Google Scholar 

  32. Hester G, Kaku H, Goldstein I J, and Wright C S, Structure of mannose-specific snowdrop (galanthus nivalis) lectin is representative of a new plant lectin family, Nature Structural Biology, 1995, 2: 472–479.

    Article  Google Scholar 

  33. Sharon N and Lisi H, Lectins, Springer, 2nd edition, Dordrecht, The Netherlands, 2003.

    Google Scholar 

  34. Wright L M, Damme E J M V, Barre A, et al., Isolation, characterization, molecular cloning and molecular modelling of two lectins of different specificities from bluebell (scilla campanulata) bulbs, Biochemical Journal, 1999, 340: 299–308.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Jiang.

Additional information

This research was supported in part by Research Grants Council of Hong Kong under Grant No. 17301214 and HKU CERG Grants and Hung Hing Ying Physical Research Grant, and the Research Funds of Renmin University of China, and the National Natural Science Foundation of China under Grant Nos. 11271144, 11101382, 11471256, and S201201009985.

This paper was recommended for publication by Editor ZOU Guohua.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Aoki-Kinoshita, K.F., Ching, WK. et al. On using physico-chemical properties of amino acids in string kernels for protein classification via support vector machines. J Syst Sci Complex 28, 504–516 (2015). https://doi.org/10.1007/s11424-015-2156-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-015-2156-y

Keywords

Navigation