Covering Arrays to Support the Process of Feature Selection in the Random Forest Classifier

Vivas, Sebastián; Cobos, Carlos; Mendoza, Martha

doi:10.1007/978-3-030-13709-0_6

Sebastián Vivas¹⁷,
Carlos Cobos¹⁷ &
Martha Mendoza¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11331))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

2160 Accesses
1 Citations

Abstract

The Random Forest (RF) algorithm consists of an assembly of base decision trees, constructed from Bootstrap subsets of the original dataset. Each subset is a sample of instances (rows) by a random subset of features (variables or columns) of the original dataset to be classified. In RF, pruning is not applied in the generation of base trees and in the classification process of a new record, each tree issues a vote enabling the selected class to be defined, as that with the most votes. Bearing in mind that in the state of the art it is defined that random feature selection for constructing the Bootstrap subsets decreases the quality of the results achieved with RF, in this work the integration of covering arrays (CA) in RF is proposed to solve this situation, in an algorithm called RFCA. In RFCA, the number N of rows of the CA defines the lowest number of base trees that require to be generated in RF and each row of the CA defines the features that each Bootstrap subset will use in the creation of each tree. To evaluate the new proposal, 32 datasets available in the UCI repository are used and compared with the RF available in Weka. The experiments show that the use of a CA of strength 2 to 7 obtains promising results in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Ziegler, A., König, I.R.: Mining data with random forests: current options for real-world applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 4, 55–63 (2014)
Article Google Scholar
Wawre, S.V., Deshmukh, S.N.: Sentimental analysis of movie review using machine learning algorithm with tuned hypeparameter. Int. J. Innov. Res. Comput. Commun. Eng. (ISO) 4, 12395–12402 (2016)
Google Scholar
Bernard, S., Heutte, L., Adam, S.: Influence of hyperparameters on random forest accuracy. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 171–180. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02326-2_18
Chapter Google Scholar
Timaná-Peña, J.A., Cobos-Lozada, C.A., Torres-Jimenez, J.: Metaheuristic algorithms for building covering arrays: a review. Rev. Fac. Ing. 25, 31–45 (2016)
Article Google Scholar
Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: a survey and results of new tests. Pattern Recogn. 44, 330–349 (2011)
Article Google Scholar
Bernard, S., Heutte, L., Adam, S.: Forest-RK: a new random forest induction method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 430–437. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85984-0_52
Chapter Google Scholar
Menze, B.H., Kelm, B.M., Splitthoff, D.N., Koethe, U., Hamprecht, F.A.: On oblique random forests. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6912, pp. 453–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23783-6_29
Chapter Google Scholar
Deng, H., Runger, G.: Feature selection via regularized trees. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1–8. IEEE (2012)
Google Scholar
Adnan, M.N.: On dynamic selection of subspace for random forest. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 370–379. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_29
Chapter Google Scholar
Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl.-Based Syst. 95, 1–11 (2016)
Article Google Scholar
Ma, L., Fan, S., Haywood, A., Ming-tian, Z., Rigol-Sanchez, J.: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinform. 18, 169 (2017)
Article Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Article Google Scholar
Scornet, E., Biau, G., Vert, J.P.: Consistency of random forests. Ann. Stat. 43, 1716–1741 (2015)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology Research Group (GTI), Universidad del Cauca, Popayán, Colombia
Sebastián Vivas, Carlos Cobos & Martha Mendoza

Authors

Sebastián Vivas
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Cobos
View author publications
You can also search for this author in PubMed Google Scholar
Martha Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Cobos .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy and University of Reading, Reading, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton
IBM, Tivoli Research Lab, Rome, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vivas, S., Cobos, C., Mendoza, M. (2019). Covering Arrays to Support the Process of Feature Selection in the Random Forest Classifier. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-13709-0_6
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics