Abstract
Feature selection technique has shown its power in analyzing the high dimensional data and building the efficient learning models. This study proposes a feature selection method based on feature grouping and genetic algorithm (FS-FGGA) to get a discriminative feature subset and reduce the irrelevant and redundancy data. Firstly, it eliminates the irrelevant features using the symmetrical uncertainty between features and class labels. Then, it groups the features by Approximate Markov blanket. Finally, genetic algorithm is applied to search the optimal feature subset from the different groups. Experiments on the eight public datasets demonstrate the effectiveness and superiority of FS-FGGA in comparison with SVM-RFE and ECBGS in most cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tang, Y.C., Zhang, Y.Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 365–381 (2007)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Holland, J.H.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA (1992)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. Mach. Learn. 784, 171–182 (1994)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Xu, J.C., Xu, T.H., Sun, L.: An efficient gene selection technique based on fuzzy C-means and neighborhood rough set. Appl. Math. Inf. Sci. 8, 3101–3110 (2014)
Yassi, M., Moattar, M.H.: Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification. Biochem. Biophys. Res. Commun. 446, 850–856 (2014)
Liu, X.M., Tang, J.S.: Mass classification in mammograms using selected geometry and texture features, and a new SVM-based feature selection method. IEEE Syst. J. 8, 910–920 (2014)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Shen, L., Tan, E.C.: Dimension reduction based penalized logistic regression for cancer classification using micro-array data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2, 166–175 (2005)
Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23, 1106–1114 (2007)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Arunachalam, J., Kanagasabai, V., Gautham, N.: Protein structure prediction using mutually orthogonal Latin squares and a genetic algorithm. Biochem. Biophys. Res. Commun. 342, 424–433 (2006)
Ram, R., Chetty, M.: A Markov-Blanked-Based model for gene regulatory network inference. IEEE-ACM Trans. Comput. Biol. Bioinform. 8, 353–367 (2011)
Abbasnia, R., Shayanfar, M., Khodam, A.: Reliability-based design optimization of structural systems using a hybrid genetic algorithm. Struct. Eng. Mech. 52, 1099–1120 (2014)
Maji, P., Garai, P.: On fuzzy-rough attribute selection: criteria of max-dependency, max-relevance, min-redundancy, and max-significance. Applied Soft Computing. 13, 3968–3980 (2013)
Xie, Z.X., Hu, Q.H., Yu, D.R.: Improved feature selection algorithm based on SVM and correlation. Adv. Neyral Netw. 3971, 1373–1380 (2006)
Mundra, P.A., Rajapakse, M.J.: SVM-RFE with mRMR filter for gene selection. IEEE transactions on nano bioscience. 9(1), 31–37 (2010)
Sun, X., Liu, Y.H., Xu, M.T., Chen, H.L., Han, J.W., Wang, K.H.: Feature selection using dynamic weights for classification. Knowl.-Based Syst. 37, 541–549 (2013)
Shen, L.L., Zhu, Z.X., Jia, S.: Discriminative Gabor feature selection for hyper spectral image classification. IEEE Geosci. Remote Sens. Lett. 10, 29–33 (2013)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Liu, H.W., Liu, L., Zhang, H.J.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43, 81–87 (2010)
Piao, Y.J., Piao, M.H., Park, K.J., Ryu, K.H.: An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28, 3306–3315 (2012)
Zhang, M., Zhang, L., Zou, J.F., Yan, C., Xiao, H., Liu, Q.: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinformatics 25, 1662–1668 (2009)
Bennasar, M., Setchi, R., Hicks, Y.: Unsupervised discretization method based on adjustable intervals. In: 16th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, vol. 243, pp. 79–87, San Sebastian (2012)
Orhan, U., Hekim, M., Ozer, M.: Epileptic seizure detection using artificial neural network and a new feature extraction approach based on equal width discretization. J. Fac. Eng. Archit. Gazi Univ. 26, 575–580 (2011)
Acknowledgments
The study has been supported by the State Key Science & Technology Project for Infectious Diseases (2012ZX10002011), the Sino-German Center for Research Promotion (GZ 753), National Natural Science Foundation of China (21375011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lin, X., Wang, X., Xiao, N., Huang, X., Wang, J. (2015). A Feature Selection Method Based on Feature Grouping and Genetic Algorithm. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)