Skip to main content
Log in

Symmetric uncertainty class-feature association map for feature selection in microarray dataset

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

For a huge number of features versus a small size of samples, feature selection methods are useful preprocessing approaches that could eliminate the irrelevant and redundant features from the final feature subset. One of the recent research areas in feature selection is DNA microarray that the number of dimensions increase fast and requires further research in the field of feature selection. Modeling the feature search space as a graph leads to improving the visualizing of features and using graph theoretic concepts in the feature selection process. In this paper, a filer-based feature selection algorithm using graph technique is proposed for reducing the dimension of dataset named as Symmetric Uncertainty Class-Feature Association Map feature selection (SU-CFAM). In the first step, it uses the Symmetric Uncertainty concept for visualizing the feature search space as a graph. After clustering the graph into several clusters using a community detection algorithm, SU-CFAM constructs an adjacency matrix for each cluster and the final subset is selected by using the concept of maximal independent set. The performance of SU-CFAM has been compared with five well-known feature selection approaches using three classifiers including SVM, DT, NB. Experiments on fifteen public DNA microarray datasets show that SU-CFAM can achieve a better classification performance compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hu X, Zhou P, Li P, Wang J, Wu X (2016) A survey on online feature selection with streaming features. Front Comput Sci 1–15

  2. Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88(supplement C):81–94

    Google Scholar 

  3. Chen T, Hong Z, Deng Fa, Yang X, Wei J, Cui M (2015) A novel selective ensemble classification of microarray data based on teaching-learning-based optimization. Int J Multimed Ubiquitous Eng 10(6):203–218

    Google Scholar 

  4. Hoque N, Bhattacharyya D, Kalita JK (2014) Mifs-nd: a mutual information-based feature selection method. Expert Syst Appl 41(14):6371–6385

    Google Scholar 

  5. Liao B, Jiang Y, Liang W, Zhu W, Cai L, Cao Z (2014) Gene selection using locality sensitive laplacian score. IEEE/ACM Trans Comput Biol Bioinform 11(6):1146–1156

    Google Scholar 

  6. Solorio-Fernandez S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2016) A new hybrid filter-wrapper feature selection method for clustering based on ranking. Neurocomputing 214:866–880

    Google Scholar 

  7. Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Academic Press, Oxford

    MATH  Google Scholar 

  8. Lai CM, Yeh WC, Chang CY (2016) Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 218:331–338

    Google Scholar 

  9. Radovic M, Ghalwash M, Filipovic N, Obradovic Z (2017) Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform 18(1):9

    Google Scholar 

  10. Peker M, Sen B, Delen D (2015) Computer-aided diagnosis of parkinson’s disease using complex-valued neural networks and mrmr feature selection algorithm. J Healthcare Eng 6(3):281–302

    Google Scholar 

  11. Sun S, Peng Q, Shakoor A (2014) A kernel-based multivariate feature selection method for microarray data classification. PloS one 9(7):e102541

    Google Scholar 

  12. Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37

    Google Scholar 

  13. Ferreira AJ, Figueiredo MA (2012) An unsupervised approach to feature discretization and selection. Pattern Recognit 45(9):3048–3060

    Google Scholar 

  14. Ferreira AJ, Figueiredo MA (2012) Efficient feature selection filters for high-dimensional data. Pattern Recognit Lett 33(13):1794–1804

    Google Scholar 

  15. Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32(supplement C):112–123

    Google Scholar 

  16. Cheriguene S, Azizi N, Zemmal N, Dey N, Djellali H, Farah N (2016) Optimized tumor breast cancer classification using combining random subspace and static classifiers selection paradigms. Applications of intelligent optimization in biology and medicine. Springer, Cham, pp 289–307

    Google Scholar 

  17. Haindl M, Somol P, Ververidis D, Kotropoulos C (2006) Feature selection based on mutual correlation. Springer, Berlin Heidelberg, pp 569–577

    Google Scholar 

  18. Brusco MJ (2014) A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis. Computat Stat Data Anal 77:38–53

    MathSciNet  MATH  Google Scholar 

  19. Li Y, Wang G, Chen H, Shi L, Qin L (2013) An ant colony optimization based dimension reduction method for high-dimensional datasets. J Bionic Eng 10(2):231–241

    Google Scholar 

  20. Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763

    Google Scholar 

  21. Sahu B, Mishra D (2012) A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Proc Eng 38(Supplement C):27–31

    Google Scholar 

  22. Martinez E, Alvarez MM, Trevino V (2010) Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. Comput Biol Chem 34(4):244–250

    Google Scholar 

  23. Oreski S, Oreski G (2014) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064

    Google Scholar 

  24. Goswami S, Saha S, Chakravorty S, Chakrabarti A, Chakraborty B (2015) A new evaluation measure for feature subset selection with genetic algorithm. Int J Intell Syst Appl 7(10):28

    Google Scholar 

  25. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Google Scholar 

  26. Shah M, Marchand M, Corbeil J (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186

    Google Scholar 

  27. Huang ML, Hung YH, Lee W, Li R, Jiang BR (2014) Svm-rfe based feature selection and taguchi parameters optimization for multiclass svm classifier. Sci World J

  28. Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: AAA, pp 470–476

  29. Mundra PA, Rajapakse JC (2010) Svm-rfe with mrmr filter for gene selection. IEEE Trans NanoBiosci 9(1):31–37

    Google Scholar 

  30. Chuang LY, Yang CH, Wu KC, Yang CH (2011) A hybrid feature selection method for dna microarray data. Comput Biol Med 41(4):228–237

    Google Scholar 

  31. Ghosh R, Kumar P, Roy PP (2018) A dempster–shafer theory based classifier combination for online signature recognition and verification systems. Int J Mach Learn Cybern 1–16

  32. Kumar P, Roy PP, Dogra DP (2018) Independent bayesian classifier combination based sign language recognition using facial expression. Inf Sci 428:30–48

    MathSciNet  Google Scholar 

  33. Kumar P, Saini R, Roy PP, Pal U (2018) A lexicon-free approach for 3d handwriting recognition using classifier combination. Pattern Recognit Lett 103:1–7

    Google Scholar 

  34. Santosh K, Roy PP (2018) Arrow detection in biomedical images using sequential classifier. Int J Mach Learn Cybern 9(6):993–1006

    Google Scholar 

  35. Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25:1–14

    Google Scholar 

  36. Mandal M, Mukhopadhyay A (2013) Unsupervised non-redundant feature selection: a graph-theoretic approach. Springer, Berlin Heidelberg, pp 373–380

    Google Scholar 

  37. Bandyopadhyay S, Bhadra T, Mitra P, Maulik U (2014) Integration of dense subgraph finding with feature clustering for unsupervised feature selection. Pattern Recognit Lett 40(Supplement C):104–112

    Google Scholar 

  38. Moradi P, Rostami M (2015) A graph theoretic approach for unsupervised feature selection. Eng Appl Artif Intell 44:33–45

    Google Scholar 

  39. Kabir MM, Islam MM, Murase K (2010) A new wrapper feature selection approach using neural network. Neurocomputing 73(16):3273–3283

    Google Scholar 

  40. Pino Angulo A (2018) Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information 9(1):6

    Google Scholar 

  41. Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl-Based Syst 23(6):580–585

    Google Scholar 

  42. Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recognit 77:20–29

    Google Scholar 

  43. Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl-Based Syst 84(Supplement C):144–161

    Google Scholar 

  44. Ghimatgar H, Kazemi K, Helfroush MS, Aarabi A (2018) An improved feature selection algorithm based on graph clustering and ant colony optimization. Knowl-Based Syst 159:270–285

    Google Scholar 

  45. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Amsterdam

    Google Scholar 

  46. Ghasemzadeh H, Amini N, Saeedi R, Sarrafzadeh M (2015) Power-aware computing in wearable sensor networks: an optimal feature selection. IEEE Trans Mobile Comput 14(4):800–812

    Google Scholar 

  47. Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532

    Google Scholar 

  48. Cover T, Thomas J (2012) Elements of information theory. Wiley, New York, USA

    MATH  Google Scholar 

  49. Le Martelot E, Hankin C (2013) Fast multi-scale detection of relevant communities in large-scale networks. Comput J 56(9):1136–1150

    Google Scholar 

  50. Blondel VD, Ioup Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10(2008):P10008

    MATH  Google Scholar 

  51. Luby M (1986) A simple parallel algorithm for the maximal independent set problem. SIAM J Comput 15(4):1036–1053

    MathSciNet  MATH  Google Scholar 

  52. Yadav T, Sadhukhan K, Mallari RA (2016) Approximation algorithm for n-distance minimal vertex cover problem. arXiv preprint arXiv:1606.02889

  53. Hippo Y, Taniguchi H, Tsutsumi S, Machida N, Chong JM, Fukayama M, Kodama T, Aburatani H (2002) Global gene expression analysis of gastric cancer by oligonucleotide microarrays. Cancer Res 62(1):233–240

    Google Scholar 

  54. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68

    Google Scholar 

  55. Piloto S, Schilling TF (2010) Ovo1 links wnt signaling with n-cadherin localization during neural crest migration. Development dev-048439

  56. Repository KRBDS kent ridge bio-medical dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd/

  57. institute B (2014) Cancer program data aets. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi

  58. Statnikov A, CF Aliferis, ITG (2005) Gene Expression Model Selector. http://www.gems-system.org

  59. Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248

    MATH  Google Scholar 

  60. Zhu Z (2018) Cancer data sets. http://csse.szu.edu.cn/staff/zhuzx/Datasets.html

  61. Quinlan JR (1986) Induction of decision trees. Mach Learn 1

  62. Obaidullah SM, Halder C, Santosh K, Das N, Roy K (2018) Phdindic\(\_11\): page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678

    Google Scholar 

  63. Cleophas TJ, Zwinderman AH (2015) Quantile-quantile plots, a good start for looking at your medical data (50 cholesterol measurements and 58 patients). Machine learning in medicine–a complete overview. Springer, Berlin, pp 253–259

    Google Scholar 

  64. Bouguelia MR, Nowaczyk S, Santosh K, Verikas A (2018) Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int J Mach Learn Cybern 9(8):1307–1319

    Google Scholar 

  65. Bouguelia MR, Nowaczyk S, Payberah AH (2018) An adaptive algorithm for anomaly and novelty detection in evolving data streams. Data Min Knowl Discov 2018:1–37

    MathSciNet  Google Scholar 

  66. Vajda S, Santosh K (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: International conference on recent trends in image processing and pattern Rrecognition, Springer, pp 185–193

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Azmi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bakhshandeh, S., Azmi, R. & Teshnehlab, M. Symmetric uncertainty class-feature association map for feature selection in microarray dataset. Int. J. Mach. Learn. & Cyber. 11, 15–32 (2020). https://doi.org/10.1007/s13042-019-00932-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-00932-7

Keywords

Navigation