Identification of cytokine via an improved genetic algorithm

Zeng, Xiangxiang; Yuan, Sisi; Huang, Xianxian; Zou, Quan

doi:10.1007/s11704-014-4089-3

Identification of cytokine via an improved genetic algorithm

Research Article
Published: 03 November 2014

Volume 9, pages 643–651, (2015)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Xiangxiang Zeng¹,
Sisi Yuan¹,
Xianxian Huang¹ &
…
Quan Zou¹

98 Accesses
29 Citations
Explore all metrics

Abstract

With the explosive growth in the number of protein sequences generated in the postgenomic age, research into identifying cytokines from proteins and detecting their biochemical mechanisms becomes increasingly important. Unfortunately, the identification of cytokines from proteins is challenging due to a lack of understanding of the structure space provided by the proteins and the fact that only a small number of cytokines exists in massive proteins. In view of fact that a proteins sequence is conceptually similar to a mapping of words to meaning, n-gram, a type of probabilistic language model, is explored to extract features for proteins. The second challenge focused on in this work is genetic algorithms, a search heuristic that mimics the process of natural selection, that is utilized to develop a classifier for overcoming the protein imbalance problem to generate precise prediction of cytokines in proteins. Experiments carried on imbalanced proteins data set show that our methods outperform traditional algorithms in terms of the prediction ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning Techniques for Enhanced Protein Sequence Classification

Identifying the missing proteins in human proteome by biological language model

Article Open access 23 December 2016

Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments

Article 18 November 2017

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Zou Q, Li X, Jiang Y, Zhao Y, Wang G. BinMemPredict: a Web server and software for predicting membrane protein types. Current Proteomics, 2013, 10(1): 2–9
Article Google Scholar
Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M. GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model. Nucleic AcidsResearch, 2005, 33(suppl 2): W148–W153
Article Google Scholar
Nielsen H, Engelbrecht J, Brunak S, Heijne G V. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems, 1997, 8(5–6): 581–599
Article Google Scholar
Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403–410
Article Google Scholar
Pearson W R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 1991, 11(3): 635–650
Article Google Scholar
Huang N, Chen H, Sun Z. CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Engineering Design and Selection, 2005, 18(8): 365–368
Article Google Scholar
Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC bioinformatics, 2009, 10(1): 381
Article Google Scholar
Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q. Hierarchical classification of protein folds using a novel ensemble classifier. PloS one, 2013, 8(2): e56499
Article Google Scholar
Zou Q, Chen W, Huang Y, Liu X, Jiang Y. Identifying multi-functional enzyme by hierarchical multi-label classifier. Journal of Computational and Theoretical Nanoscience, 2013, 10(4): 1038–1043
Article Google Scholar
Chou K C, Shen H B. Recent advances in developing web-servers for predicting protein attributes. Natural Science, 2009, 1(2): 63–92
Article Google Scholar
Ganapathiraju M, Weisser D, Rosenfeld R, Carbonell J, Reddy R, Klein-Seetharaman J. Comparative n-gram analysis of whole-genome protein sequences. In: Proceedings of the 2nd International Conference on Human Language Technology Research. 2002, 76–81
Chapter Google Scholar
Srinivasan S M, Vural S, King B R, Guda C. Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics, 2013, 14(1): 96
Article Google Scholar
Koza J R. Genetic Programming. MIT press, 1992
Google Scholar
Sun Y, Kamel M S, Wong A K, Wang Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12): 3358–3378
Article MATH Google Scholar
Lewis D, Gale W. Training text classifiers by uncertainty sampling. In: Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval. 1994.
Google Scholar
Kubat M, Holte R C, Matwin S. Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998, 30(2–3): 195–215
Article Google Scholar
Fawcett T. An introduction to ROC analysis. Pattern recognition letters, 2006, 27(8): 861–874
Article MathSciNet Google Scholar
Provost F J, Fawcett T. Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 1997, 97: 43–48
Google Scholar
Bateman A, Coin L, Durbin R, Finn R D, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer E L L, Studholme D J, Yeats C, Eddy, S. R. The Pfam protein families database. Nucleic Acids Research, 2004, 32: D138–D141
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Xiamen University, Xiamen, 361005, China
Xiangxiang Zeng, Sisi Yuan, Xianxian Huang & Quan Zou

Authors

Xiangxiang Zeng
View author publications
Search author on:PubMed Google Scholar
Sisi Yuan
View author publications
Search author on:PubMed Google Scholar
Xianxian Huang
View author publications
Search author on:PubMed Google Scholar
Quan Zou
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Quan Zou.

Additional information

Xiangxiang Zeng received his BS degree in automation from Hunan University, China in 2005, and his PhD in systems engineering from Huazhong University of Science and Technology, China in 2011. From 2010 to 2011 he spent one year working in the group of natural computing in Seville University, Spain. Currently, he is an assistant professor in the Department of Computer Science, Xiamen University, China. His main research interests include membrane computing, neural computing and automaton theory.

Sisi Yuan is a Master student of the Department of Computer Science at Xiamen University, China. She received her BS degree in software engineering from Hangzhou Dianzi University, China. Her research interests include data mining and bioinformatics.

Xianxian Huang is an undergraduate student of the Department of Computer Science at Xiamen University, China. His main research interests are data mining and bioinformatics.

Quan Zou is an associate professor of computer science at Xiamen University, China. He received his PhD degree from Harbin Institute of Technology, China in 2009. His research is in the areas of bioinformatics, machine learning and parallel computing. Now his focus is on genome assembly, annotation, and functional analysis from next generation sequencing data with parallel computing methods. Several related works have been published in Briefings in Bioinformatics, Bioinformatics, PLOS ONE, and IEEE/ACMTransactions on Computational Biology and Bioinformatics. He serves on many impactful journals and the National Natural Science Foundation of China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, X., Yuan, S., Huang, X. et al. Identification of cytokine via an improved genetic algorithm. Front. Comput. Sci. 9, 643–651 (2015). https://doi.org/10.1007/s11704-014-4089-3

Download citation

Received: 04 March 2014
Accepted: 22 May 2014
Published: 03 November 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s11704-014-4089-3

Keywords

Profiles

Xiangxiang Zeng View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of cytokine via an improved genetic algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Learning Techniques for Enhanced Protein Sequence Classification

Identifying the missing proteins in human proteome by biological language model

Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now