ABSTRACT
Feature selection plays an important role in data mining and machine learning. It helps to reduce the dimensionality of data and increase the performance of classification algorithms. A variety of feature selection methods have been presented in state-of-the-art literature to resolve feature selection problems such as large search space in high dimensional datasets like in microarray. However, it is a challenging task to identify the best feature selection method that suits a specific scenario or situation. In this paper, we present a comprehensive survey of the recent research work on feature selection methods, their types, strengths and weaknesses, and recent contributions in related areas. Current issues and challenges are also discussed to identify future research directions.
- Chapter 2 of book Data Classification, Algorithms and Applications by Charu C. Aggarwal Chapman and Hall/CRC 2014, Pages 37--64 Print ISBN: 978-1-4665-8674-1 eBook ISBN: 978-1-4665-8675-8Google Scholar
- H. Liu, R. Setiono, Chi2: feature selection and discretization of numeric attributes, in: Tools with Artificial Intelligence, 1995. Proceedings., Seventh International Conference on, IEEE, 1995, pp. 388--391. Google ScholarDigital Library
- R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 1999. Google ScholarDigital Library
- J.R. Quinlan, Induction of decision trees, Mach. Learn. 1 (1) (1986) 81--106. Google ScholarDigital Library
- I. Kononenko, Estimating attributes: analysis and extensions of relief, in: Machine Learning: ECML-94, Springer, 1994, pp. 171--182. Google ScholarDigital Library
- H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1226--1238. Google ScholarDigital Library
- V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J.M. Benitez, F. Herrera, "A review of microarray datasets and applied feature selection Methods", Information Sciences 282 (2014) 111--135 Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.Google ScholarCross Ref
- J. Weston, A. Elisseff, B. Schoelkopf, and M. Tipping. Use of the zero norm with linear models and kernel methods. Journal of Machine Learning Research, 3:1439--1461, 2003. Google ScholarDigital Library
- L. Song, A. Smola, A. Gretton, K. Borgwardt, and J. Bedo. Supervised feature selection via dependence estimation. In International Conference on Machine Learning, 2007. Google ScholarDigital Library
- Z. Zhao and H. Liu. Semi-supervised feature selection via spectral analysis. In Proceedings of SIAM International Conference on Data Mining, 2007.Google ScholarCross Ref
- Z. Xu, R. Jin, J. Ye, M. Lyu, and I. King. Discriminative semi-supervised feature selection via manifold regularization. In IJCAI' 09: Proceedings of the 21th International Joint Conference on Artificial Intelligence, 2009. Google ScholarDigital Library
- J.G. Dy and C.E. Brodley. Feature subset selection and order identification for unsupervised learning. In In Proc. 17th International Conference on Machine Learning, pages 247--254. Morgan Kaufmann, 2000. Google ScholarDigital Library
- P. Mitra, C. A. Murthy, and S. Pal. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:301--312, 2002. Google ScholarDigital Library
- M. R. Sikonja and I. Kononenko. Theoretical and empirical analysis of Relief and ReliefF. Machine Learning, 53:23--69, 2003. Google ScholarDigital Library
- M.A. Hall, Correlation-based Feature Selection for Machine Learning, PhD Thesis, The University of Waikato, 1999.Google Scholar
- H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1226--1238, 2005. Google ScholarDigital Library
- H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4):491, 2005. Google ScholarDigital Library
- N.L. C. Talbot G. C. Cawley and M. Girolami. Sparse multinomial logistic regression via bayesian l1 regularization. In Neural Information Processing Systems, 2006. Google ScholarDigital Library
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- S. Alelyani, J. Tang, and H. Liu. Feature selection for clustering: A review. Data Clustering: Algorithms and Applications, Editor: Charu Aggarwal and Chandan Reddy, CRC Press, 2013.Google Scholar
- V. Bolon-Canedo, N.Sanchez-Marono, A. Alonso-Betanzos, "Recent advances and emerging challenges of feature selection in the context of big data" Knowledge-Based Systems 86 (2015) 33--45 Google ScholarDigital Library
- Z. Zhao, H. Liu, Searching for interacting features, in: IJCAI, vol. 7, 2007, pp. 1156--1161. Google ScholarDigital Library
- V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, A review of feature selection methods on synthetic data, Knowl. Inform. Syst. 34 (3) (2013) 483--519. Google ScholarDigital Library
- Zaffalon, M., Hutter, M.: Robust feature selection using distributions of mutual information. In: UAI. pp. 577--584 (2002) Google ScholarDigital Library
- I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (1--3) (2002) 389--422. Google ScholarDigital Library
- L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in: Machine Learning-International Workshop then Conference, vol. 20, 2003, p. 856. Google ScholarDigital Library
- R. Kohavi and G.H. John. Wrappers for feature subset selection. Artificial intelligence, 97(1-2):273--324, 1997. Google ScholarDigital Library
- M.A. Hall and L.A. Smith. Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference, volume 235, page 239, 1999. Google ScholarDigital Library
- H. Liu and H. Motoda. Feature selection for knowledge discovery and data mining, volume 454. Springer, 1998. Google ScholarDigital Library
- M. Dash and H. Liu, "Feature selection for classification," Intell. Data Anal., vol. 1, nos. 1--4, pp. 131--156, 1997. Google ScholarDigital Library
- I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46(1-3):389--422, 2002. Google ScholarDigital Library
- J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81--106, 1986. Google ScholarCross Ref
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google ScholarCross Ref
- H. Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418--1429, 2006.Google Scholar
- K. Knight and W. Fu. Asymptotics for lasso-type estimators. Annals of Statistics, pages 1356--1378, 2000.Google Scholar
- J. Huang, J. L. Horowitz, and S. Ma. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. The Annals of Statistics, 36(2):587--613, 2008.Google ScholarCross Ref
- T. Li, C. Zhang, M. Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics 20 (15) (2004) 2429--2437. Google ScholarDigital Library
- H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.Google Scholar
- P. Pudil, J. Novovicová, and J. V. Kittler, "Floating search methods in feature selection," Pattern Recognit. Lett., vol. 15, no. 11, pp. 1119--1125, 1994. Google ScholarDigital Library
- H. Uguz, A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm, Knowl.-Based Syst. 24 (7) (2011) 1024--1032. Google ScholarDigital Library
- S. Ma and J. Huang. Penalized feature selection and classification in bioinformatics. Briefings in bioinformatics, 9(5):392--403, 2008.Google ScholarCross Ref
- R. Ruiz, J.C. Riquelme, J.S. Aguilar-Ruiz, Incremental wrapper-based gene selection from microarray data for cancer classification, Pattern Recognit. 39 (12) (2006) 2383--2392. Google ScholarDigital Library
- J.W. Lee, J.B. Lee, M. Park, S.H. Song, An extensive comparison of recent classification tools applied to microarray data, Comput. Stat. Data Anal. 48 (4) (2005) 869--885Google ScholarCross Ref
- S. Dudoit, J. Fridlyand, T.P. Speed, Comparison of discrimination methods for the classification of tumors using gene expression data, J. Am. Stat. Assoc. 97 (457) (2002) 77--87.Google ScholarCross Ref
- R. Blanco, P. Larrañaga, I. Inza, B. Sierra, Gene selection for cancer classification using wrapper approaches, Int. J. Pattern Recognit. Artif. Intell. 18 (08) (2004) 1373--1390.Google ScholarCross Ref
- R. Ruiz, J.C. Riquelme, J.S. Aguilar-Ruiz, Incremental wrapper-based gene selection from microarray data for cancer classification, Pattern Recognit. 39 (12) (2006) 2383--2392. Google ScholarDigital Library
- V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, J.M. Benítez, F. Herrera, A review of microarray datasets and applied feature selection methods, Inform. Sci. 282 (2014) 111--135. Google ScholarDigital Library
- P.P. Ohanian, R.C. Dubes, Performance evaluation for four classes of textural features, Pattern Recognit. 25 (8) (1992) 819--833.Google ScholarCross Ref
- J. Lu, T. Zhao, Y. Zhang, Feature selection based-on genetic algorithm for image annotation, Knowl.-Based Syst. 21 (8) (2008) 887--891. Google ScholarDigital Library
- V. Bolon-Canedo, N.Sanchez-Marono, A. Alonso-Betanzos, "Recent advances and emerging challenges of feature selection in the context of big data" Knowledge-Based Systems 86 (2015) 33--45 Google ScholarDigital Library
- G. Forman, An extensive empirical study of feature selection metrics for text classification, J. Mach. Learn. Res. 3 (2003) 1289--1305. Google ScholarCross Ref
- H. Kim, P. Howland, H. Park, Dimension reduction in text classification with support vector machines, J. Mach. Learn. Res. (2005) 37--53. Google ScholarDigital Library
- G. Forman, Feature selection for text classification, Comput. Methods Feat. Sel. (2008) 257--276.Google Scholar
- K.-J. Wang, K.-H. Chen, and M.-A. Angelia, "An improved artificial immune recognition system with the opposite sign test for feature selection," Knowl. Based Syst., vol. 71, pp. 126--145, Nov. 2014. Google ScholarDigital Library
- B. Xue, M. Zhang, and W. N. Browne, "Particle swarm optimization for feature selection in classification: A multi-objective approach," IEEE Trans. Cybern., vol. 43, no. 6, pp. 1656--1671, Dec. 2013.Google ScholarCross Ref
- J. R. Vergara and P. A. Estévez, "A review of feature selection methods based on mutual information," Neural Comput. Appl., vol. 24, no. 1, pp. 175--186, 2014.Google ScholarCross Ref
- H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and minredundancy," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226--1238, Aug. 2005. Google ScholarDigital Library
- X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen, "Feature selection based on rough sets and particle swarm optimization," Pattern Recognit. Lett., vol. 28, no. 4, pp. 459--471, 2007. Google ScholarDigital Library
- E. Alpaydin, Introduction to Machine Learning. Cambridge, MA, USA: MIT Press, 2004. Google ScholarDigital Library
- J. Tang and H. Liu, "Coselect: Feature selection with instance selection for social media data," in Proc. SIAM Int. Conf. Data Min. (SDM), Austin, TX, USA, 2013, pp. 695--703.Google ScholarCross Ref
- B. Xue, "Particle swarm optimization for feature selection," Ph.D. dissertation, School Eng. Comput. Sci., Victoria Univ. Wellington, Wellington, New Zealand, 2014.Google Scholar
- H. Liu and H. Motoda. Computational Methods of Feature Selection. Chapman and Hall/CRC Press, 2007. Google ScholarDigital Library
- Y. Zhai, Y.-S. Ong, and I. W. Tsang, "The emerging 'big dimensionality,"' IEEE Comput. Intell. Mag., vol. 9, no. 3, pp. 14--26, Aug. 2014. Google ScholarDigital Library
- S. Wang, W. Pedrycz, Q. Zhu, and W. Zhu, "Subspace learning for unsupervised feature selection via matrix factorization," Pattern Recognit., vol. 48, no. 1, pp. 10--19, 2015. Google ScholarDigital Library
- K.Mani, P.Kalpana, "A Review on Filter Based Feature Selection", International Journal of Innovative Research in Computer and Communication Engineering, Vol. 4, Issue 5, May 2016Google Scholar
Recommendations
Fractional-Step Dimensionality Reduction
Linear projections for dimensionality reduction, computed using linear discriminant analysis (LDA), are commonly based on optimization of certain separability criteria in the output space. The resulting optimization problem is linear, but these ...
Stable local dimensionality reduction approaches
Dimensionality reduction is a big challenge in many areas. A large number of local approaches, stemming from statistics or geometry, have been developed. However, in practice these local approaches are often in lack of robustness, since in contrast to ...
Supervised nonlinear dimensionality reduction for visualization and classification
When performing visualization and classification, people often confront the problem of dimensionality reduction. Isomap is one of the most promising nonlinear dimensionality reduction techniques. However, when Isomap is applied to real-world data, it ...
Comments