A survey on feature selection methods for mixed data

Solorio-Fernández, Saúl; Carrasco-Ochoa, J. Ariel; Martínez-Trinidad, José Francisco

doi:10.1007/s10462-021-10072-6

A survey on feature selection methods for mixed data

Published: 29 September 2021

Volume 55, pages 2821–2846, (2022)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Saúl Solorio-Fernández ORCID: orcid.org/0000-0002-4674-852X¹,
J. Ariel Carrasco-Ochoa¹ &
José Francisco Martínez-Trinidad¹

2648 Accesses
27 Citations
2 Altmetric
Explore all metrics

Abstract

Feature Selection for mixed data is an active research area with many applications in practical problems where numerical and non-numerical features describe the objects of study. This paper provides the first comprehensive and structured revision of the existing supervised and unsupervised feature selection methods for mixed data reported in the literature. Additionally, we present an analysis of the main characteristics, advantages, and disadvantages of the feature selection methods reviewed in this survey and discuss some important open challenges and potential future research opportunities in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of unsupervised feature selection methods

Article 29 January 2019

A Comprehensive Review on Unsupervised Feature Selection Algorithms

Feature Selection

Notes

Also called heterogeneous or assorted data.
The label assigned to each object in the dataset can be a category, an ordered value, or a real value, depending on the specific task.
For the case of UFS methods, class labels are not used in this step.
A parameter given by the user in the range (0, 1) that specifies the average fraction of features per cluster.

References

Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton
Book Google Scholar
Ahmad A, Khan SS (2019) Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7:31883–31902
Article Google Scholar
Akaike H (1998) Information theory and an extension of the maximum likelihood principle. In: Selected papers of Hirotugu Akaike, pp 199–213. Springer
Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications, vol 29. CRC Press, Boca Raton, pp 110–121
Google Scholar
Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989. https://doi.org/10.1109/TCBB.2015.2478454
Article Google Scholar
Balaji K, Lavanya K (2018) Clustering algorithms for mixed datasets: a review. Int J Pure Appl Math 118(7):547–556
Google Scholar
Barcelo-Rico F, Diez JL (2012) Geometrical codification for clustering mixed categorical and numerical databases. J Intell Inf Syst 39(1):167–185. https://doi.org/10.1007/s10844-011-0187-y
Article Google Scholar
Ben Haj Kacem MA, Ben N’Cir CE, Essoussi N (2015) MapReduce-based k-prototypes clustering method for big data. In: Proceedings of the 2015 IEEE international conference on data science and advanced analytics, DSAA 2015 (October 2015), pp 4–6. https://doi.org/10.1109/DSAA.2015.7344894
Bharti KK, kumar Singh P (2014) A survey on filter techniques for feature selection in text mining. In: Proceedings of the second international conference on soft computing for problem solving (SocProS 2012), 28–30 Dec 2012, pp 1545–1559. Springer
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. Springer, Berlin. https://doi.org/10.1007/978-3-319-21858-8
Book Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Inc, Belmont, CA
MATH Google Scholar
Bruin J (2011) newtest: command to compute new test {@ONLINE}. https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.11.077
Article Google Scholar
Cantú-Paz E (2001) Supervised and unsupervised discretization methods for evolutionary algorithms. In: Workshop proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 213–216
Chandra B, Gupta M (2011) An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform 44(4):529–535. https://doi.org/10.1016/j.jbi.2011.01.001
Article Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Article Google Scholar
Chaudhuri A, Samanta D, Sarma M (2021) Two-stage approach to feature set optimization for unsupervised dataset with heterogeneous attributes. Expert Syst Appl 172(January):114563. https://doi.org/10.1016/j.eswa.2021.114563
Article Google Scholar
Chen D, Yang Y (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334. https://doi.org/10.1109/TFUZZ.2013.2291570
Article Google Scholar
Chen D, Zhang L, Zhao S, Hu Q, Zhu P (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389. https://doi.org/10.1109/TFUZZ.2011.2173695
Article Google Scholar
Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331
Article Google Scholar
Coelho F, Braga AP, Verleysen M (2016) A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. Int J Comput Intell Syst 9(4):726–733. https://doi.org/10.1080/18756891.2016.1204120
Article Google Scholar
Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Abingdon
Book Google Scholar
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley-Interscience, Hoboken
MATH Google Scholar
Dash M, Liu H (2000) Feature selection for clustering. In: Terano T, Liu H, Chen ALP (eds) Knowledge discovery and data mining. Current issues and new applications. Springer, Berlin, pp 110–121
Chapter Google Scholar
Dash M, Liu H, Yao J (1997) Dimensionality reduction of unsupervised data. In: Proceedings ninth IEEE international conference on tools with artificial intelligence, pp 532–539. IEEE Computer Society. https://doi.org/10.1109/TAI.1997.632300. http://ieeexplore.ieee.org/document/632300/
Dash R, Paramguru RL, Dash R (2011) Comparative analysis of supervised and unsupervised discretization techniques. Int J Adv Sci Technol 2(3):29–37
Google Scholar
De Leon AR, Chough KC (2013) Analysis of mixed data: methods & applications. CRC Press, Boca Raton
Book Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM-Algorithm. JSTOR 39:1–22. https://doi.org/10.2307/2984875
Article MathSciNet MATH Google Scholar
Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: a review. Multimedia Tools Appl 78(3):3797–3816
Article Google Scholar
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Hoboken
MATH Google Scholar
Doan DM, Jeong DH, Ji SY (2020) Designing a feature selection technique for analyzing mixed data. In: 2020 10th annual computing and communication workshop and conference, CCWC 2020, Institute of Electrical and Electronics Engineers Inc., pp 46–52. https://doi.org/10.1109/CCWC47524.2020.9031193
Doquire G, Verleysen M (2011a) An hybrid approach to feature selection for mixed categorical and continuous data. In: Proceedings of the international conference on knowledge discovery and information retrieval, pp 394–401. https://doi.org/10.5220/0003634903940401. http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0003634903940401
Doquire G, Verleysen M (2011b) Mutual information based feature selection for mixed data. In: 19th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2011), pp 333–338
Dos Santos TRL, Zárate LE (2015) Categorical data clustering: what similarity measure to recommend? Expert Syst Appl 42(3):1247–1260. https://doi.org/10.1016/j.eswa.2014.09.012
Article Google Scholar
Dutta D, Dutta P, Sil J (2014) Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int J Hybrid Intell Syst 11(1):41–54
Google Scholar
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889. https://doi.org/10.1016/j.patrec.2014.11.006
Article MathSciNet MATH Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI
Focant I, Hernandez-Lobato D, Ducreux J, Durez P, Toukap AN, Elewaut D, Houssiau FA, Dupont P, Lauwerys B (2011) Feasibility of a molecular diagnosis of arthritis based on the identification of specific transcriptomic profiles in knee synovial biopsies. Arthritis Rheum 63:abstract 1927:S751
Google Scholar
Fop M, Murphy TB (2018) Variable selection methods for model-based clustering. Stat Surv 12:18–65. https://doi.org/10.1214/18-SS119
Article MathSciNet MATH Google Scholar
Foss AH, Markatou M, Ray B (2018) Distance metrics and clustering methods for mixed-type data. Int Stat Rev. https://doi.org/10.1111/insr.12274
Article Google Scholar
Fowlkes EB, Gnanadesikan R, Kettenring JR (1988) Variable selection in clustering. J Classif 5(2):205–228. https://doi.org/10.1007/BF01897164
Article MathSciNet Google Scholar
François D, Wertz V, Verleysen M (2006) The permutation test for feature selection by mutual information. In: ESANN 2006 Proceedings—European symposium on artificial neural networks, pp 239–244
García S, Luengo J, Sáez JA, López V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750. https://doi.org/10.1109/TKDE.2012.35
Article Google Scholar
Garg VK, Rudin C, Jaakkola T (2016) CRAFT: ClusteR-specific Assorted Feature selecTion. In: Artificial intelligence and statistics, pp 305–313
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Article Google Scholar
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373
MATH Google Scholar
Gniazdowski Z, Grabowski M (2016) Numerical coding of nominal data. arXiv preprint arXiv:1601.01966
Greco S, Matarazzo B, Slowinski R (2001) Rough sets theory for multicriteria decision analysis. Eur J Oper Res 129(1):1–47. https://doi.org/10.1016/S0377-2217(00)00167-3
Article MATH Google Scholar
Green PJ (1990) On use of the EM for penalized likelihood estimation. J R Stat Soc Ser B (Methodol) 52(3):443–452
MathSciNet MATH Google Scholar
Guyon I, Elisseeff A, De AM (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182. https://doi.org/10.1016/j.aca.2011.07.027
Article MATH Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the seventeenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML ’00, pp 359–366. http://dl.acm.org/citation.cfm?id=645529.657793
Hancer E, Xue B, Zhang M (2020) A survey on feature selection approaches for clustering. Artif Intell Rev 53(6):4519–4545. https://doi.org/10.1007/s10462-019-09800-w
Article Google Scholar
Hartemink A, Gifford DK (2001) Principled computational methods for the validation and discovery of genetic regulatory networks. Massachusetts Institute of Technology. Ph.D. thesis, Ph. D. dissertation
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems 18, vol 186, pp 507–514
Hennig C, Liao TF (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc: Ser C: Appl Stat 62(3):309–369. https://doi.org/10.1111/j.1467-9876.2012.01066.x
Article MathSciNet Google Scholar
Hu Q, Liu J, Yu D (2008a) Mixed feature selection based on granulation and approximation. Knowl-Based Syst 21(4):294–304
Article Google Scholar
Hu Q, Yu D, Liu J, Wu C (2008b) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594. https://doi.org/10.1016/j.ins.2008.05.024
Article MathSciNet MATH Google Scholar
Hua H, Zhao H (2009) A discretization algorithm of continuous attributes based on supervised clustering. In: 2009 Chinese conference on pattern recognition, pp 1–5. IEEE
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific–Asia conference on knowledge discovery and data mining (PAKDD), Singapore, pp 21–34
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304
Article Google Scholar
Jensen R, Shen Q (2004) Fuzzy-rough attribute reduction with application to web categorization. Fuzzy Sets Syst 141(3):469–485. https://doi.org/10.1016/S0165-0114(03)00021-6
Article MathSciNet MATH Google Scholar
Jiang SY, Wang LX (2016) Efficient feature selection based on correlation measure between continuous and discrete features. Inf Process Lett 116(2):203–215. https://doi.org/10.1016/j.ipl.2015.07.005
Article MathSciNet MATH Google Scholar
Jović A, Brkić K, Bogunović N (2015a) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458
Jović A, Brkić K, Bogunović N (2015b) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics, MIPRO 2015—proceedings vol #, no May, pp 1200–1205, https://doi.org/10.1109/MIPRO.2015.7160458. http://ieeexplore.ieee.org/document/7160458/
Kerber R (1992) Chimerge: discretization of numeric attributes. In: Proceedings of the tenth national conference on Artificial intelligence, pp 123–128. Aaai Press
Kim KJ, Jun CH (2018) Rough set model based feature selection for mixed-type data with feature space decomposition. Expert Syst Appl 103:196–205. https://doi.org/10.1016/j.eswa.2018.03.010
Article Google Scholar
Koller D, Sahami M (1996) Toward optimal feature selection. Technical report, Stanford InfoLab
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Machine learning: ECML-94, pp 171–182. Springer
Kotsiantis SB (2011) Feature selection for machine learning classification problems: a recent overview. Artif Intell Rev 42:157–176. https://doi.org/10.1007/s10462-011-9230-1
Article Google Scholar
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E - Stat Nonlinear Soft Matter Phys 69(62):1–16. https://doi.org/10.1103/PhysRevE.69.066138
Article MathSciNet Google Scholar
Kurgan LA, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153. https://doi.org/10.1109/TKDE.2004.1269594
Article Google Scholar
Kwak N (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671. https://doi.org/10.1109/TPAMI.2002.1114861
Article Google Scholar
Lam D, Wei M, Wunsch D (2015) Clustering data of mixed categorical and numerical type with unsupervised feature learning. IEEE Access 3:1605–1616. https://doi.org/10.1109/ACCESS.2015.2477216
Article Google Scholar
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R, Bersini H, Nowé A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf 9(4):1106–1119. https://doi.org/10.1109/TCBB.2012.33
Article Google Scholar
Lee J, Jeong JY, Jun CH (2020) Markov blanket-based universal feature selection for classification and regression of mixed-type data. Expert Syst Appl 158:113398. https://doi.org/10.1016/j.eswa.2020.113398
Article Google Scholar
Lee PY, Loh WP, Chin JF (2017) Feature selection in multimedia: the state-of-the-art review. Image Vis Comput 67:29–42. https://doi.org/10.1016/j.imavis.2017.09.004
Article Google Scholar
Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data. IEEE Trans Knowl Data Eng 14(4):673–690. https://doi.org/10.1109/TKDE.2002.1019208
Article Google Scholar
Li J, Liu H (2017) Challenges of feature selection for big data analytics. IEEE Intell Syst 32(2):9–15. https://doi.org/10.1109/MIS.2017.38
Article Google Scholar
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2016) Feature selection: a data perspective. J Mach Learn Res:1–73. arXiv:1601.07996
Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):551–577. https://doi.org/10.1007/s10115-017-1059-8
Article Google Scholar
Liang J, Zhao X, Li D, Cao F, Dang C (2012) Determining the number of clusters using information entropy for mixed data. Pattern Recogn 45(6):2251–2265. https://doi.org/10.1016/j.patcog.2011.12.017
Article MATH Google Scholar
Liang S, Ma A, Yang S, Wang Y, Ma Q (2018) A review of matched-pairs feature selection methods for gene expression data analysis. Comput Struct Biotechnol J 16:88–97. https://doi.org/10.1016/j.csbj.2018.02.005
Article Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, Berlin. https://doi.org/10.1007/978-1-4615-5689-3
Book MATH Google Scholar
Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton
Book Google Scholar
Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: TAI, p 388. IEEE
Liu H, Yu L, Member SS, Yu L, Member SS (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502. https://doi.org/10.1109/TKDE.2005.66
Article Google Scholar
Liu H, Wei R, Jiang G (2013) A hybrid feature selection scheme for mixed attributes data. Comput Appl Math 32(1):145–161
Article MathSciNet Google Scholar
Liu N (2012) The research of intrusion detection based on mixed clustering algorithm. In: Li Z, Li X, Liu Y, Cai Z (eds) Communications in computer and information science. CCIS, vol 316. Springer, Berlin, pp 92–100. https://doi.org/10.1007/978-3-642-34289-9_11
Chapter Google Scholar
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z
Article MathSciNet Google Scholar
Marbac M, Sedki M (2017) Variable selection for mixed data clustering: a model-based approach. eprint arXiv, arXiv:1703.02293
Marbac M, Sedki M (2019) VarSelLCM: an R/C++ package for variable selection in model-based clustering of mixed-data with missing values. Bioinformatics 35(7):1255–1257. https://doi.org/10.1093/bioinformatics/bty786
Article Google Scholar
Miao J, Niu L (2016) A survey on feature selection. Procedia Comput Sci 91(Itqm):919–926. https://doi.org/10.1016/j.procs.2016.07.111
Article Google Scholar
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell PAMI 24(3):301–312. https://doi.org/10.1109/34.990133
Article Google Scholar
Monti S, Cooper GF (1999) A latent variable model for multivariate discretization. In: AISTATS
Mugunthadevi K, Punitha SC, Punithavalli M (2011) Survey on feature selection in document clustering. Int J Comput Sci Eng 3(3):1240–1244
Google Scholar
Niu K, Niu Z, Su Y, Wang C, Lu H, Guan J (2015) A coupled user clustering algorithm based on mixed data for web-based learning systems. Math Probl Eng 2015:747628. https://doi.org/10.1155/2015/747628
Article Google Scholar
Pal SK, Mitra P (2004) Pattern recognition algorithms for data mining, 1st edn. Chapman and Hall/CRC, London
Book Google Scholar
Paul J, Dupont P (2014) Kernel methods for mixed feature selection. In: 22nd European symposium on artificial neural networks, computational intelligence and machine learning, ESANN 2014—proceedings, pp 301–306. Citeseer
Paul J, Dupont P (2015) Kernel methods for heterogeneous feature selection. Neurocomputing 169:187–195. https://doi.org/10.1016/j.neucom.2014.12.098
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356. https://doi.org/10.1007/BF01001956
Article MATH Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Article Google Scholar
Rao VM, Sastry VN (2012) Unsupervised feature ranking based on representation entropy. In: 2012 1st international conference on recent advances in information technology, RAIT-2012, pp 421–425. https://doi.org/10.1109/RAIT.2012.6194631
Remeseiro B, Bolon-Canedo V (2019) A review of feature selection methods in medical applications. Comput Biol Med 112(February):103375. https://doi.org/10.1016/j.compbiomed.2019.103375
Article Google Scholar
Ren M, Liu P, Wang Z, Lü L (2016) An improved kernel clustering algorithm for mixed-type data in network forensic. Int J Secur Appl 10(1):343–354. https://doi.org/10.14257/ijsia.2016.10.1.31
Article Google Scholar
Rudnicki WR, Wrzesień M, Paja W (2013) Feature selection for data and pattern classification
Ruiz-Shulcloper J (2008) Pattern recognition with mixed and incomplete data. Pattern Recognit Image Anal 18(4):563–576. https://doi.org/10.1134/S1054661808040044
Article Google Scholar
Ruiz-Shulcloper J, Abidi M (2002) Logical combinatorial pattern recognition: a review. Citeseer, pp 133–176
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. https://doi.org/10.1093/bioinformatics/btm344
Article Google Scholar
Sang B, Chen H, Li T, Xu W, Yu H (2020) Incremental approaches for heterogeneous feature selection in dynamic ordered data. Inf Sci 541:475–501. https://doi.org/10.1016/j.ins.2020.06.051
Article MathSciNet MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet Google Scholar
Sharmin S, Shoyaib M, Ali AA, Khan MAH, Chae O (2019) Simultaneous feature selection and discretization based on mutual information. Pattern Recognit 91:162–174. https://doi.org/10.1016/j.patcog.2019.02.016
Article Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64(November 2016):141–158. https://doi.org/10.1016/j.patcog.2016.11.003
Article MATH Google Scholar
Solorio-Fernández S, Martínez-Trinidad JF, Carrasco-Ochoa JA (2017) A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognit 72:314–326. https://doi.org/10.1016/j.patcog.2017.07.020
Article Google Scholar
Solorio-Fernández S, Martínez-Trinidad JF, Carrasco-Ochoa JA (2019) A supervised filter feature selection method for mixed data based on the spectral gap score. In: Carrasco-Ochoa JA, Martínez-Trinidad JF, Olvera-López JA, Salas J (eds) Pattern recognition. Springer International Publishing, Cham, pp 3–13
Chapter Google Scholar
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020a) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948. https://doi.org/10.1007/s10462-019-09682-y
Article Google Scholar
Solorio-Fernández S, Martínez-Trinidad JF, Carrasco-Ochoa JA (2020b) A supervised filter feature selection method for mixed data based on spectral feature selection and information-theory redundancy analysis. Pattern Recogn Lett 138:321–328. https://doi.org/10.1016/j.patrec.2020.07.039
Article Google Scholar
Storlie CB, Myers SM, Katusic SK, Weaver AL, Voigt RG, Croarkin PE, Stoeckel RE, Port JD (2018) Clustering and variable selection in the presence of mixed variable types and missing data. Stat Med 37(19):2884–2899. https://doi.org/10.1002/sim.7697
Article MathSciNet Google Scholar
Su X, Liu F (2018) A survey for study of feature selection based on mutual information. In: Workshop on hyperspectral image and signal processing, evolution in remote sensing, vol 2018-Sept, pp 1–4. https://doi.org/10.1109/WHISPERS.2018.8746913
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Aggarwal CC (ed) Data classification: algorithms and applications. CRC Press, Boca Raton, p 37
MATH Google Scholar
Tang W, Mao K (2005) Feature selection algorithm for data with both nominal and continuous features. In: Ho TB, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining: 9th Pacific–Asia conference, PAKDD 2005, Hanoi, Vietnam, 18–20 May 2005. Proceedings, pp 683–688. Springer, Berlin. https://doi.org/10.1007/11430919_78
Tang W, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recognit Lett 28(5):563–571. https://doi.org/10.1016/j.patrec.2006.10.008
Article MathSciNet Google Scholar
Tsamardinos I, Aliferis CF, Statnikov AR, Statnikov E (2003) Algorithms for large scale Markov blanket discovery. FLAIRS Conf 2:376–380
Google Scholar
van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536. https://doi.org/10.1038/415530a
Article Google Scholar
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186. https://doi.org/10.1007/s00521-013-1368-0
Article Google Scholar
Wang F, Liang J (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41. https://doi.org/10.1016/j.neucom.2016.01.056
Article Google Scholar
Wei M, Chow TWS, Chan RHM (2015) Heterogeneous feature subset selection using mutual information-based feature transformation. Neurocomputing 168:706–718. https://doi.org/10.1016/j.neucom.2015.05.053
Article Google Scholar
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9(1):60–62. https://doi.org/10.1214/aoms/1177732360
Article MATH Google Scholar
Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34. https://doi.org/10.1613/jair.346
Article MathSciNet MATH Google Scholar
Wong AKC, Chiu DKY (1987) Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans Pattern Anal Mach Intell PAMI PAMI–9(6):796–805. https://doi.org/10.1109/TPAMI.1987.4767986
Article Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 267–273
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224. https://doi.org/10.1145/1014052.1014149
Article MathSciNet MATH Google Scholar
Zhang R, Nie F, Li X, Wei X (2019) Feature selection with multi-view data: a survey. Inf Fusion 50:158–167. https://doi.org/10.1016/j.inffus.2018.11.019
Article Google Scholar
Zhang X, Mei C, Chen D, Li J (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56:1–15. https://doi.org/10.1016/j.patcog.2016.02.013
Article MATH Google Scholar
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning, pp 1151–1157. ACM
Zhao ZA, Liu H (2011) Spectral feature selection for data mining. Data mining and knowledge discovery series, 1st edn. Chapman & Hall/CRC, London. https://doi.org/10.1201/b11426
Book Google Scholar
Zheng Z, Lei W, Huan L (2010) Efficient spectral feature selection with minimum redundancy. In: Twenty-fourth AAAI conference on artificial intelligence, pp 1–6

Download references

Acknowledgements

The first author gratefully acknowledges to the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) for the collaboration grant awarded for the completion of this survey.

Author information

Authors and Affiliations

Computer Sciences Department, Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, 72840, Puebla, Mexico
Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Francisco Martínez-Trinidad

Authors

Saúl Solorio-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
J. Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
José Francisco Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saúl Solorio-Fernández.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solorio-Fernández, S., Carrasco-Ochoa, J. & Martínez-Trinidad, J.F. A survey on feature selection methods for mixed data. Artif Intell Rev 55, 2821–2846 (2022). https://doi.org/10.1007/s10462-021-10072-6

Download citation

Accepted: 11 September 2021
Published: 29 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10462-021-10072-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on feature selection methods for mixed data

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

A Comprehensive Review on Unsupervised Feature Selection Algorithms

Feature Selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey on feature selection methods for mixed data

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

A Comprehensive Review on Unsupervised Feature Selection Algorithms

Feature Selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation