Skip to main content
Log in

A new hybrid stability measure for feature selection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Feature Selection (FS) algorithms are applied in bioinformatics applications to identify the disease causing genes. Performance of such algorithms is measured in terms of accuracy of the model and stability of FS algorithms. Stability evaluates the identical replication of feature sets obtained after every execution. Recently research has shown that a stability measure must satisfy set of properties like, fully defined, monotonicity, boundedness, deterministic maximum stability, and correction for chance. Among the existing stability measures, only Nogueira’s frequency based stability measure satisfies all the required properties. However, frequency based stability measures fail to discriminate among the cases when overall frequency of features are same. In order to address this issue, the paper proposes a hybrid similarity based stability measure which satisfies all the desirable properties, as mentioned earlier. The proposed stability measure is unique as it is the first similarity based stability measure that satisfies all the required properties. Also, all these essential properties are mathematically established. Further, the paper also proposes a combination of frequency based and similarity based measure which preserves all the aspects of both the approaches. The work presented also analyzes the stability performance of LASSO and Elastic Net, using synthetic and microarray gene expression datasets. Elastic Net depicts higher stability and selection of relevant features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alelyani S, Zhao Z, Liu H (2011) A dilemma in assessing stability of feature selection algorithms. IEEE International Conference on HPCC, pp 701–707

  2. Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, Campbell H, Wright AF, Wilson JF, Agakov F, Navarro P, Haley CS (2015) Application of high-dimensional feature selection Evaluation for genomic prediction in man. Sci Rep 5:1–12

    Google Scholar 

  3. Bolȯn-Canedo V, Sȧnchez-Marono N, Alonso-Betanzos A, Beni̇tez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135

    Article  Google Scholar 

  4. Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Ku̇ffner R, Zimmer R (2006) Reliable gene signatures for microarray classification Assessment of stability and performance. Bioinformatics 22 (19):2356–2363

    Article  Google Scholar 

  5. Dunne K, Cunningham P, Azuaje F (2002) Solutions to instability problems with sequential wrapper-based approaches to feature selection. J Mach Learn Res. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.4109

  6. Goh WWB, Wong L (2016) Evaluating feature-selection stability in next-generation proteomics. J Bioinform Comput Biol 14(05):1–23

    Google Scholar 

  7. Guzmȧn-Marti̇nez R, Alaiz-Rodri̇guez R (2011) Feature selection stability assessment based on the Jensen-Shannon divergence

  8. Kalousis A, Prados J, Hilario M (2005) Stability of feature selection algorithms. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8

  9. Kamkar I, Gupta SK, Phung D, Venkatesh S (2015) Stable feature selection with support vector machines. In: Australasian joint conference on artificial intelligence. Springer, Cham, pp 298–308

  10. Krızek P (2016) Improving stability of feature selection methods, Caip 2009, pp 865–872

  11. Kuncheva LI (2007) A stability index for feature selection. In: 25Th international multi-conference: artificial intelligence and applications. ACTA Press, pp 390–395

  12. Lausser L, Mu̇ssel C, Maucher M, Kestler HA (2013) Measuring and visualizing the stability of biomarker selection techniques. Comput Stat 28(1):51–65

    Article  MathSciNet  Google Scholar 

  13. Lustgarten JL, Gopalakrishnan V, Visweswaran S (2009) Measuring stability of feature selection in biomedical datasets. In American Medical Informatics Association Symposium. American Medical Informatics Association, pp 406–410

  14. Nogueira S, Sechidis K, Brown G (2017) On the stability of feature selection algorithms. J Mach Learn Res 18(1):6345–6398

    MathSciNet  MATH  Google Scholar 

  15. Osanaiye O, Cai H, Choo KKR, Dehghantanha A, Xu Z, Dlodlo M (2016) Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. Eurasip Journal on Wireless Communications and Networking 2016(1)

  16. Sarah Nogueira B, Brown G (2016) Machine learning and knowledge discovery in databases. In: European conference on machine learning and principles and practice of knowledge discovery in databases, pp 442–457

  17. Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M (2015) Stable gene signature selection for prediction of breast cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 12 (6):1440–1448

    Article  Google Scholar 

  18. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas Anne B, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu T-M, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan X-h, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li Q-Z, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161

  19. Somol P, Novovičová J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans Pattern Anal Mach Intell 32(11):1921–1939

    Article  Google Scholar 

  20. Turney P (1995) Technical Note: Bias and the quantification of stability. Mach Learn 20:23–33

    Google Scholar 

  21. Wald R, Khoshgoftaar TM, Napolitano A (2013) Stability of filter- and Wrapper-Based feature subset selection. In: 25th international conference on tools with artificial intelligence. IEEE, pp 374–380

  22. Yu L, Ding C, Loscalzo S, Stable feature selection via dense feature groups. In: 14Th ACM SIGKDD International conference on Knowledge discovery and data mining - KDD 08. ACM Press New York pp 803–811 (2008)

  23. Zarkoob H, Mehrdad J (2015) Gangeh, and ali ghodsi. Fast and scalable feature selection for gene expression data using Hilbert-Schmidt independence criterion. IEEE Trans Comput Biol Bioinform 14(1):167–181

    Google Scholar 

  24. Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D, Wang C, Guo Z (2009) Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinform 25(13):1662–1668

  25. Zhou DX (2013) On grouping effect of elastic net. Stat Probab Lett 83(9):2108–2112

    Article  MathSciNet  Google Scholar 

  26. Zucknick M, Richardson Sa, Stronach EA (2008) Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Statistical Applications in Genetics and Molecular Biology 7(1):Article7

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshata K. Naik.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naik, A.K., Kuppili, V. & Edla, D.R. A new hybrid stability measure for feature selection. Appl Intell 50, 3471–3486 (2020). https://doi.org/10.1007/s10489-020-01731-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01731-2

Keywords

Navigation