Skip to main content
Log in

Improving ELM-based microarray data classification by diversified sequence features selection

  • Extreme Learning Machine and Applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we focus on the problem of extreme learning machine (ELM)-based microarray data classification. Different from the traditional classification problem, the goal in this case is not just to predict the class labels for the unseen samples, but to make clear what lead to the results, i.e., the genes involving with a specific disease. This is especially significant for biologists, since they need to decipher the causes of disease. As a black-box method, ELM could not measure up to the task by itself. In this work, we propose a diversified sequence feature selection-based framework to address the problem. In this framework, (1) a sequence model, EWave, is introduced to ensure the structural ordering information among genes exploitable; (2) a concept of irreducible sequence is proposed, where the genes work as an orderly whole to keep high confidence with a specific class and any reduction in the genes decreases the confidence much. An efficient sequence mining algorithm together with some effective pruning rules is developed to mine such sequences; and (3) we study how to extract a set of diversified sequence features as the representative of all mined results. The problem is proved to be NP-hard. A greedy algorithm is presented to approximate the optimal solution. Experimental results show that the proposed approach significantly improves the efficiency and the effectiveness of ELM w.r.t some widely used feature selection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. That is, IG (information gain), TR (twoing rule), SM (sum minority), MM (max minority), GI (Gini index) and SV (sum of variance).

References

  1. Tavazoie S, Hughes J, Campbell M, Cho R, Church G (1999) Systematic determination of genetic network architecture. Nat Genetics 22:281–285

    Article  Google Scholar 

  2. Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868

    Article  Google Scholar 

  3. Alizadeh A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511

    Article  Google Scholar 

  4. Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, (Budapest, Hungary), pp 985–990

  5. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501

    Article  Google Scholar 

  6. Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N (2006) Can threshold networks be trained directly? IEEE Trans Circuits Syst II 53(3):187–191

    Article  Google Scholar 

  7. Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multi-category classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495

    Article  Google Scholar 

  8. Zhao X, Wang G, Bi X, Gong P, Zhao Y (2011) Xml document classification based on elm. Neurocomputing 74(16):2444–2451

    Article  Google Scholar 

  9. Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1–3):262–268

    Article  Google Scholar 

  10. Wang DD, Wang R, Yan H (2014) Fast prediction of protein-protein interaction sites based on extreme learning machines. Neurocomputing 128:258–266

    Article  Google Scholar 

  11. Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495

    Article  Google Scholar 

  12. Yeu CWT, Lim MH, Huang GB, Agarwal A, Ong YS (2006) A new machine learning paradigm for terrain reconstruction. IEEE Geosci Remote Sens Lett 3(3):382–386

    Article  Google Scholar 

  13. Huang G-B, Ding X, Zhou H (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74(1–3):155–163

    Article  Google Scholar 

  14. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  15. Lo D, Khoo S-C, Li J (2008) Mining and ranking generators of sequential patterns. In: SDM, pp 553–564

  16. Cong G, Tung AKH, Xu X et al (2004) Farmer: finding interesting rule groups in microarray datasets. In: SIGMOD, pp 143–154

  17. Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79–90

  18. Gao C, Wang J, He Y (2008) Efficient mining of frequent sequence generators. In: WWW, pp 1051–1052

  19. Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–206

    Article  MathSciNet  Google Scholar 

  20. Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: KDD, pp 737–742

  21. Zuckerman D (1996) On unapproximable versions of np-complete problems. SIAM J Comput 25(6):1293–1304

    Article  MATH  MathSciNet  Google Scholar 

  22. Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  23. Hedenfalk I, Duggan D, Chen Y et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548

    Article  Google Scholar 

  24. Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) Rankgene: identification of diagnostic genes based on expression data. Bioinformatics 19(12):1578–1579

    Article  Google Scholar 

  25. Lee KE, Sha N, Dougherty ER et al (2003) Gene selection: a bayesian variable selection approach. Bioinformatics 19(1):90–97

    Article  Google Scholar 

  26. Udler M, Maia AT, Cebrian A et al (2007) Common germline genetic variation in antioxidant defense genes and survival after diagnosis of breast cancer. J Clin Oncol 25(21):3015–3023

    Article  Google Scholar 

Download references

Acknowledgments

Supported by \(863\) program \((2012\hbox {AA}011004), \,973\) program \((2011\hbox {CB}302200\hbox {-G})\), National Science Fund for Distinguished Young Scholars \((61025007)\), State Key Program of National Natural Science of China \((60933001,\,61332014)\), National Natural Science Foundation of China \((61272182,\,61100028,\,61073063,\,61173030)\), New Century Excellent Talents (NCET-\(11\)-\(0085\)), China Postdoctoral Science Foundation \((2012\hbox {T}50263,\,2011\hbox {M}500568)\), and Fundamental Research Funds for the Central Universities \((\hbox {N}110404005,\,\hbox {N}110404017)\).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhai Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Wang, G., Yin, Y. et al. Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput & Applic 27, 155–166 (2016). https://doi.org/10.1007/s00521-014-1571-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-014-1571-7

Keywords

Navigation