Abstract
Feature selection has become one of the most active research areas in the field of data mining. It allows removing redundant and irrelevant data sets of large size. Furthermore, there are several methods in the literature for selecting attributes. In this article, a new multi-objective method is proposed to select relevant and non-redundant features. Our proposed feature selection method is divided into three stages: The first step computes the feature relevance value based on random forests. The second step, computes the dissimilarity matrix representing the dependence between the features of our training datasets, and transform it into a complete graph whose nodes represent features and edges represent the values of dissimilarities between them. The last step is for the optimization in which a multi-objective optimization algorithm is applied. The proposed method is applied on many datasets to find the most relevant and non-redundant features and the performance of the proposed method is compared with that of the popular MBEGA, mRMR (MIQ) and mRMR (MID).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Samb, M.L., Camara, F., Ndiaye, S., Slimani, Y., Esseghir, M.A.: Approche de sélection d’attributs pour la classification basée sur l’algorithme rfe-svm
Chouaib, H.: Sélection de caractéristiques:méthodes et applications (2011). http://www.math-info.univ-paris5.fr/~vincent/siten/Publications/theses/pdf/chouaib.pdf
Zhu, Z., Ong, Y.-S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40, 3236–3248 (2007). http://www.sciencedirect.com/science/article/pii/S0031320307000945
John, G.H.: Enhancements to the data mining process. Doctoral dissertation, Ph.D. thesis of Stanford University (1997)
Kohavi, R., Pfleger, K., John, G.H.: Irrelevant features and the subset selection problem, pp. 121–129 (1994)
Mandal, M., Mukhopadhyay, A.: A graph-theoretic approach for identifying non-redundant and relevant gene markers from microarray data using multiobjective binary PSO. PLoS ONE 9(3), e90949 (2014)
Koller, D., Sahami, M.: Toward Optimal Feature Selection. pp. 284–292. Stanford InfoLab, Stanford (1996)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994)
You, W., Yang, Z., Ji, G.: PLS-based recursive feature elimination for high-dimensional small sample. Knowl.-Based Syst. 55, 15–28 (2014)
Zhou, Q., Zhou, H., Zhou, Q., Yang, F., Luo, L.: Structure damage detection based on random forest recursive feature elimination. Mech. Syst. Sig. Process. 46(1), 82–90 (2014)
Azhagusundari, B., Thanamani, A.S.: Feature selection based on information gain. Int. J. Innov. Technol. Explor. Eng. (IJITEE) ISSN 2278–3075 (2013)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Yu, L., Liu, H., Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, vol. 3, pp. 856–863 (2003)
Ghattas, B., Ishak, A.B.: Sélection de variables pour la classification binaire en grande dimension: comparaisons et application aux données de biopuces. J. de la société française de statistique 149(3), 43–66 (2008)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genomewide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)
Crescenzi, P., Kann, V., Halldórsson, M.: A compendium of NP optimization problems (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Noura, A., Shili, H., Romdhane, L.B. (2017). Reliable Attribute Selection Based on Random Forest (RASER). In: Madureira, A., Abraham, A., Gamboa, D., Novais, P. (eds) Intelligent Systems Design and Applications. ISDA 2016. Advances in Intelligent Systems and Computing, vol 557. Springer, Cham. https://doi.org/10.1007/978-3-319-53480-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-53480-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53479-4
Online ISBN: 978-3-319-53480-0
eBook Packages: EngineeringEngineering (R0)