Improving ELM-based microarray data classification by diversified sequence features selection

Zhao, Yuhai; Wang, Guoren; Yin, Ying; Li, Yuan; Wang, Zhanghui

doi:10.1007/s00521-014-1571-7

Improving ELM-based microarray data classification by diversified sequence features selection

Extreme Learning Machine and Applications
Published: 16 April 2014

Volume 27, pages 155–166, (2016)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yuhai Zhao¹,
Guoren Wang¹,
Ying Yin¹,
Yuan Li¹ &
…
Zhanghui Wang¹

417 Accesses
18 Citations
Explore all metrics

Abstract

In this paper, we focus on the problem of extreme learning machine (ELM)-based microarray data classification. Different from the traditional classification problem, the goal in this case is not just to predict the class labels for the unseen samples, but to make clear what lead to the results, i.e., the genes involving with a specific disease. This is especially significant for biologists, since they need to decipher the causes of disease. As a black-box method, ELM could not measure up to the task by itself. In this work, we propose a diversified sequence feature selection-based framework to address the problem. In this framework, (1) a sequence model, EWave, is introduced to ensure the structural ordering information among genes exploitable; (2) a concept of irreducible sequence is proposed, where the genes work as an orderly whole to keep high confidence with a specific class and any reduction in the genes decreases the confidence much. An efficient sequence mining algorithm together with some effective pruning rules is developed to mine such sequences; and (3) we study how to extract a set of diversified sequence features as the representative of all mined results. The problem is proved to be NP-hard. A greedy algorithm is presented to approximate the optimal solution. Experimental results show that the proposed approach significantly improves the efficiency and the effectiveness of ELM w.r.t some widely used feature selection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Feature selection using differential evolution for microarray data classification

Article Open access 05 October 2023

Sanjay Prajapati, Himansu Das & Mahendra Kumar Gourisaria

An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets

Article Open access 09 December 2019

Jamshid Pirgazi, Mohsen Alimoradi, … Mohammad Hossein Olyaee

A Hybrid Gene Selection and Classification Approach for Microarray Data Based on Clustering and PSO

Notes

That is, IG (information gain), TR (twoing rule), SM (sum minority), MM (max minority), GI (Gini index) and SV (sum of variance).

References

Tavazoie S, Hughes J, Campbell M, Cho R, Church G (1999) Systematic determination of genetic network architecture. Nat Genetics 22:281–285
Article Google Scholar
Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868
Article Google Scholar
Alizadeh A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Article Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, (Budapest, Hungary), pp 985–990
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Article Google Scholar
Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N (2006) Can threshold networks be trained directly? IEEE Trans Circuits Syst II 53(3):187–191
Article Google Scholar
Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multi-category classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495
Article Google Scholar
Zhao X, Wang G, Bi X, Gong P, Zhao Y (2011) Xml document classification based on elm. Neurocomputing 74(16):2444–2451
Article Google Scholar
Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1–3):262–268
Article Google Scholar
Wang DD, Wang R, Yan H (2014) Fast prediction of protein-protein interaction sites based on extreme learning machines. Neurocomputing 128:258–266
Article Google Scholar
Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495
Article Google Scholar
Yeu CWT, Lim MH, Huang GB, Agarwal A, Ong YS (2006) A new machine learning paradigm for terrain reconstruction. IEEE Geosci Remote Sens Lett 3(3):382–386
Article Google Scholar
Huang G-B, Ding X, Zhou H (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74(1–3):155–163
Article Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Lo D, Khoo S-C, Li J (2008) Mining and ranking generators of sequential patterns. In: SDM, pp 553–564
Cong G, Tung AKH, Xu X et al (2004) Farmer: finding interesting rule groups in microarray datasets. In: SIGMOD, pp 143–154
Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79–90
Gao C, Wang J, He Y (2008) Efficient mining of frequent sequence generators. In: WWW, pp 1051–1052
Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–206
Article MathSciNet Google Scholar
Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: KDD, pp 737–742
Zuckerman D (1996) On unapproximable versions of np-complete problems. SIAM J Comput 25(6):1293–1304
Article MATH MathSciNet Google Scholar
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Article Google Scholar
Hedenfalk I, Duggan D, Chen Y et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548
Article Google Scholar
Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) Rankgene: identification of diagnostic genes based on expression data. Bioinformatics 19(12):1578–1579
Article Google Scholar
Lee KE, Sha N, Dougherty ER et al (2003) Gene selection: a bayesian variable selection approach. Bioinformatics 19(1):90–97
Article Google Scholar
Udler M, Maia AT, Cebrian A et al (2007) Common germline genetic variation in antioxidant defense genes and survival after diagnosis of breast cancer. J Clin Oncol 25(21):3015–3023
Article Google Scholar

Download references

Acknowledgments

Supported by \(863\) program \((2012\hbox {AA}011004), \,973\) program \((2011\hbox {CB}302200\hbox {-G})\), National Science Fund for Distinguished Young Scholars \((61025007)\), State Key Program of National Natural Science of China \((60933001,\,61332014)\), National Natural Science Foundation of China \((61272182,\,61100028,\,61073063,\,61173030)\), New Century Excellent Talents (NCET-\(11\)-\(0085\)), China Postdoctoral Science Foundation \((2012\hbox {T}50263,\,2011\hbox {M}500568)\), and Fundamental Research Funds for the Central Universities \((\hbox {N}110404005,\,\hbox {N}110404017)\).

Author information

Authors and Affiliations

College of Information Science and Engineer, Northeastern University, Shenyang, 110819, China
Yuhai Zhao, Guoren Wang, Ying Yin, Yuan Li & Zhanghui Wang

Authors

Yuhai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guoren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Yin
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhanghui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuhai Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Wang, G., Yin, Y. et al. Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput & Applic 27, 155–166 (2016). https://doi.org/10.1007/s00521-014-1571-7

Download citation

Received: 15 September 2013
Accepted: 12 March 2014
Published: 16 April 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s00521-014-1571-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving ELM-based microarray data classification by diversified sequence features selection

Abstract

Access this article

Similar content being viewed by others

Feature selection using differential evolution for microarray data classification

An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets

A Hybrid Gene Selection and Classification Approach for Microarray Data Based on Clustering and PSO

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving ELM-based microarray data classification by diversified sequence features selection

Abstract

Access this article

Similar content being viewed by others

Feature selection using differential evolution for microarray data classification

An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets

A Hybrid Gene Selection and Classification Approach for Microarray Data Based on Clustering and PSO

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation