Skip to main content
Log in

Identification of cytokine via an improved genetic algorithm

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

With the explosive growth in the number of protein sequences generated in the postgenomic age, research into identifying cytokines from proteins and detecting their biochemical mechanisms becomes increasingly important. Unfortunately, the identification of cytokines from proteins is challenging due to a lack of understanding of the structure space provided by the proteins and the fact that only a small number of cytokines exists in massive proteins. In view of fact that a proteins sequence is conceptually similar to a mapping of words to meaning, n-gram, a type of probabilistic language model, is explored to extract features for proteins. The second challenge focused on in this work is genetic algorithms, a search heuristic that mimics the process of natural selection, that is utilized to develop a classifier for overcoming the protein imbalance problem to generate precise prediction of cytokines in proteins. Experiments carried on imbalanced proteins data set show that our methods outperform traditional algorithms in terms of the prediction ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zou Q, Li X, Jiang Y, Zhao Y, Wang G. BinMemPredict: a Web server and software for predicting membrane protein types. Current Proteomics, 2013, 10(1): 2–9

    Article  Google Scholar 

  2. Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M. GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model. Nucleic AcidsResearch, 2005, 33(suppl 2): W148–W153

    Article  Google Scholar 

  3. Nielsen H, Engelbrecht J, Brunak S, Heijne G V. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems, 1997, 8(5–6): 581–599

    Article  Google Scholar 

  4. Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403–410

    Article  Google Scholar 

  5. Pearson W R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 1991, 11(3): 635–650

    Article  Google Scholar 

  6. Huang N, Chen H, Sun Z. CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Engineering Design and Selection, 2005, 18(8): 365–368

    Article  Google Scholar 

  7. Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC bioinformatics, 2009, 10(1): 381

    Article  Google Scholar 

  8. Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q. Hierarchical classification of protein folds using a novel ensemble classifier. PloS one, 2013, 8(2): e56499

    Article  Google Scholar 

  9. Zou Q, Chen W, Huang Y, Liu X, Jiang Y. Identifying multi-functional enzyme by hierarchical multi-label classifier. Journal of Computational and Theoretical Nanoscience, 2013, 10(4): 1038–1043

    Article  Google Scholar 

  10. Chou K C, Shen H B. Recent advances in developing web-servers for predicting protein attributes. Natural Science, 2009, 1(2): 63–92

    Article  Google Scholar 

  11. Ganapathiraju M, Weisser D, Rosenfeld R, Carbonell J, Reddy R, Klein-Seetharaman J. Comparative n-gram analysis of whole-genome protein sequences. In: Proceedings of the 2nd International Conference on Human Language Technology Research. 2002, 76–81

    Chapter  Google Scholar 

  12. Srinivasan S M, Vural S, King B R, Guda C. Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics, 2013, 14(1): 96

    Article  Google Scholar 

  13. Koza J R. Genetic Programming. MIT press, 1992

    Google Scholar 

  14. Sun Y, Kamel M S, Wong A K, Wang Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12): 3358–3378

    Article  MATH  Google Scholar 

  15. Lewis D, Gale W. Training text classifiers by uncertainty sampling. In: Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval. 1994.

    Google Scholar 

  16. Kubat M, Holte R C, Matwin S. Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998, 30(2–3): 195–215

    Article  Google Scholar 

  17. Fawcett T. An introduction to ROC analysis. Pattern recognition letters, 2006, 27(8): 861–874

    Article  MathSciNet  Google Scholar 

  18. Provost F J, Fawcett T. Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 1997, 97: 43–48

    Google Scholar 

  19. Bateman A, Coin L, Durbin R, Finn R D, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer E L L, Studholme D J, Yeats C, Eddy, S. R. The Pfam protein families database. Nucleic Acids Research, 2004, 32: D138–D141

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Zou.

Additional information

Xiangxiang Zeng received his BS degree in automation from Hunan University, China in 2005, and his PhD in systems engineering from Huazhong University of Science and Technology, China in 2011. From 2010 to 2011 he spent one year working in the group of natural computing in Seville University, Spain. Currently, he is an assistant professor in the Department of Computer Science, Xiamen University, China. His main research interests include membrane computing, neural computing and automaton theory.

Sisi Yuan is a Master student of the Department of Computer Science at Xiamen University, China. She received her BS degree in software engineering from Hangzhou Dianzi University, China. Her research interests include data mining and bioinformatics.

Xianxian Huang is an undergraduate student of the Department of Computer Science at Xiamen University, China. His main research interests are data mining and bioinformatics.

Quan Zou is an associate professor of computer science at Xiamen University, China. He received his PhD degree from Harbin Institute of Technology, China in 2009. His research is in the areas of bioinformatics, machine learning and parallel computing. Now his focus is on genome assembly, annotation, and functional analysis from next generation sequencing data with parallel computing methods. Several related works have been published in Briefings in Bioinformatics, Bioinformatics, PLOS ONE, and IEEE/ACMTransactions on Computational Biology and Bioinformatics. He serves on many impactful journals and the National Natural Science Foundation of China.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, X., Yuan, S., Huang, X. et al. Identification of cytokine via an improved genetic algorithm. Front. Comput. Sci. 9, 643–651 (2015). https://doi.org/10.1007/s11704-014-4089-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-4089-3

Keywords

Navigation