A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization

Maissae, Haddouchi; Abdelaziz, Berrado

doi:10.1007/s41060-022-00316-1

A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization

Regular Paper
Published: 07 March 2022

Volume 14, pages 45–63, (2022)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

459 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

This paper introduces ForestDisc, an optimized, supervised, multivariate, and nonparametric discretization algorithm based on tree ensemble learning and moment matching optimization. At its core, ForestDisc uses, for each continuous attribute in the data space, moment matching to elect popular split points based on those generated while constructing a random forest model. An extensive empirical study involving 50 benchmark datasets and six classification algorithms reveals that ForestDisc is highly competitive compared with 20 major discretizers based on both intrinsic and extrinsic performance measures. The intrinsic metrics include the number of resulting bins per variable and the execution time necessary for discretizing an attribute. The extrinsic metrics concern the performance of the discretizers when applied as a preprocessing step to classification tasks, and include accuracy, F1, and Kappa measures. ForestDisc discretizer also enables an excellent trade-off between intrinsic and extrinsic performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Tuning ForestDisc Hyperparameters: A Sensitivity Analysis

T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes

Article 27 April 2016

Rank Forest: Systematic Attribute Sub-spacing in Decision Forest

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

All data used are publicly available in UCI Machine Learning repository and Keel datasets repository.

Code Availability Statement

The implementation and the computational work are done using the R language and environment for statistical computing. The code, the data files, and the resulting files of the benchmark reported in the article are available via https://github.com/HMAISSAE/ForestDisc_Bench.git.

References

Frank, E., Witten, I.H.: Making better use of global discretization, 115–123 (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Conference held at Bled, Slovenia, to 1999-06-30)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6, 393–423 (2002)
Article MathSciNet Google Scholar
Lustgarten, J.L., Gopalakrishnan, V., Grover, H., Visweswaran, S.: Improving classification performance with discretization on biomedical datasets. AMIA Annu. Symp. Proc. 2008, 445–449 (2008)
Google Scholar
Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 101–116. Springer, Boston (2010)
Google Scholar
Vorobeva, A.A.: Influence of features discretization on accuracy of random forest classifier for web user identification. IEEE, St-Petersburg, Russia, 498–504 (2017)
Berrado, A., Runger, G.C.: Using metarules to organize and group discovered association rules. Data Min. Knowl. Discov. 14(3), 409–431 (2007). https://doi.org/10.1007/s10618-006-0062-6
Article MathSciNet Google Scholar
Azmi, M., Runger, G.C., Berrado, A.: Interpretable regularized class association rules algorithm for classification in a categorical data space. Inf. Sci. 483, 313–331 (2019). https://doi.org/10.1016/j.ins.2019.01.047
Article MATH Google Scholar
Deng, H.: Interpreting tree ensembles with inTrees. Int. J. Data Sci. Anal. 7(4), 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8
Article Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features, pp. 194–202. Elsevier, Amsterdam (1995)
Google Scholar
Ramırez-Gallego, S., Garcıa, S., Martınez-Rego, D., Benıtez, J. M., Herrera, F.: Data Discretization: Taxonomy and Big Data Challenge 26
Agre, G.: On supervised and unsupervised discretization. Cybern. Inf. Technol. (2002)
Ching, J., Wong, A., Chan, K.: Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 641–651 (1995). https://doi.org/10.1109/34.391407
Article Google Scholar
Wang, C., Wang, M., She, Z., Cao, L., Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J.: CD: a coupled discretization algorithm. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, pp. 407–418. Springer, Berlin (2012)
Chapter Google Scholar
Wong, A.K.C., Chiu, D.K.Y.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. PAMI9(6), 796–805 (1987). https://doi.org/10.1109/TPAMI.1987.4767986
Article Google Scholar
Ali, R., Siddiqi, M.H., Lee, S.: Rough set-based approaches for discretization: a compact review. Artif. Intell. Rev. 44(2), 235–263 (2015). https://doi.org/10.1007/s10462-014-9426-2
Article Google Scholar
Mehta, S., Parthasarathy, S., Yang, H.: Toward unsupervised correlation preserving discretization. IEEE Trans. Knowl. Data Eng. 17(9), 1174–1185 (2005). https://doi.org/10.1109/TKDE.2005.153
Article Google Scholar
Muhlenbach, F., Rakotomalala, R.: Discretization of Continuous Attributes Idea group reference edn hal-00383757v2, 397–402 (2005)
Garcia, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013). https://doi.org/10.1109/TKDE.2012.35
Article Google Scholar
Berrado, A., Runger, G.C.: Supervised multivariate discretization in mixed data with Random Forests. IEEE, Rabat, Morocco, pp. 211–217 (2009)
Haddouchi, M., Berrado, A.: An implementation of a multivariate discretization for supervised learning using Forestdisc 1–6 (2020). https://doi.org/10.1145/3419604.3419772
Haddouchi, M.: ForestDisc: Forest Discretization. R package version 0.1.0. https://CRAN.R-project.org/package=ForestDisc (2020)
Sriwanna, K., Puntumapon, K., Waiyamai, K., Zhou, S., Zhang, S., Karypis, G.: An enhanced class-attribute interdependence maximization discretization algorithm. In: Zhou, S., Zhang, S., Karypis, G. (eds.) Advanced Data Mining and Applications. Lecture Notes in Computer Science, pp. 465–476. Springer, Berlin (2012)
Chapter Google Scholar
Kurtcephe, M., Güvenir, H.A.: A discretization method based on maximizing the area under receiver operating characteristic curve. Int. J. Pattern Recognit. Artif. Intell. 27(01), 1350002 (2013). https://doi.org/10.1142/S021800141350002X
Article MathSciNet Google Scholar
Baka, A., Wettayaprasit, W., Vanichayobon, S.: A novel discretization technique using Class Attribute Interval Average, pp. 95–100 (2014)
Yan, D., Liu, D., Sang, Y.: A new approach for discretizing continuous attributes in learning systems. Neurocomputing 133, 507–511 (2014). https://doi.org/10.1016/j.neucom.2013.12.005
Article Google Scholar
Sang, Y., et al.: An effective discretization method for disposing high-dimensional data. Inf. Sci. 270, 73–91 (2014). https://doi.org/10.1016/j.ins.2014.02.113
Article MathSciNet MATH Google Scholar
Huang, W., Pan, Y., Wu, J.: Supervised discretization for optimal prediction. Procedia Comput. Sci. 30, 75–80 (2014). https://doi.org/10.1016/j.procs.2014.05.383
Article Google Scholar
CanoAlberto, T.N., VenturaSebastián, JC.: Ur-CAIM. Soft Computing - A Fusion of Foundations, Methodologies and Applications (2016)
Ramírez-Gallego, S., García, S., Benítez, J. M., Herrera, F., Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A.: A Wrapper evolutionary approach for supervised multivariate discretization: a case study on decision trees. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Advances in Intelligent Systems and Computing. Springer, Cham, pp. 47–58 (2016)
Sriwanna, K., Boongoen, T., Iam-On, N. Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J.H.: An enhanced univariate discretization based on cluster ensembles. In:Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. H. (eds) Proceedings in Adaptation, Learning and Optimization, Intelligent and Evolutionary Systems. Springer, Cham, pp. 85–98 (2016)
Khanmohammadi, S., Chou, C.-A.: A Gaussian mixture model based discretization algorithm for associative classification of medical data. Expert Syst. Appl. 58, 119–129 (2016). https://doi.org/10.1016/j.eswa.2016.03.046
Article Google Scholar
Geaur Rahman, M., Zahidul Islam, M.: Discretization of continuous attributes through low frequency numerical values and attribute interdependency. Expert Syst. Appl. 45, 410–423 (2016). https://doi.org/10.1016/j.eswa.2015.10.005
Article Google Scholar
Qiu, Q., Huang, W.: Forward supervised discretization for multivariate with categorical responses. Big Data Inf. Anal. 1(2/3), 217–225 (2016). https://doi.org/10.3934/bdia.2016005
Article Google Scholar
Wen, L.-Y., Min, F., Wang, S.-Y.: A two-stage discretization algorithm based on information entropy. Appl. Intell. 47(4), 1169–1185 (2017). https://doi.org/10.1007/s10489-017-0941-0
Article Google Scholar
Sriwanna, K., Boongoen, T., Iam-On, N.: Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM). Human-Centric Comput. Inf. Sci. 7(1), 21 (2017). https://doi.org/10.1186/s13673-017-0103-8
Article Google Scholar
Tahan, M.H., Asadi, S.: MEMOD: a novel multivariate evolutionary multi-objective discretization. Soft Comput. 22(1), 301–323 (2018). https://doi.org/10.1007/s00500-016-2475-5
Article Google Scholar
Hacibeyoglu, M., Ibrahim, M.H.: EFunique: an improved version of unsupervised equal frequency discretization method. Arabian J. Sci. Eng. 43(12), 7695–7704 (2018). https://doi.org/10.1007/s13369-018-3144-z
Article Google Scholar
Ehrhardt, A., Vandewalle, V., Biernacki, C., Heinrich, P.: Supervised multivariate discretization and levels merging for logistic regression. Iasi, Romania (2018)
Google Scholar
Drias, H., Moulai, H., Rehkab, N.: LR-SDiscr: an efficient algorithm for supervised discretization. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) Intelligent Information and Database Systems, vol. 10751, pp. 266–275. Springer, Cham (2018)
Chapter Google Scholar
Abachi, H.M., Hosseini, S., Maskouni, M.A., Kangavari, M., Cheung, N.-M., Wang, J., Cong, G., Chen, J., Qi, J.: Statistical discretization of continuous attributes using Kolmogorov-Smirnov test. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds.) Databases Theory and Applications. Lecture Notes in Computer Science, pp. 309–315. Springer, Cham (2018)
Google Scholar
Flores, J.L., Calvo, B., Perez, A.: Supervised non-parametric discretization based on Kernel density estimation. Pattern Recognit. Lett. 128, 496–504 (2019). https://doi.org/10.1016/j.patrec.2019.10.016
Article Google Scholar
Mutlu, A., Göz, F., Akbulut, O.: lFIT: an unsupervised discretization method based on the Ramer–Douglas–Peucker algorithm. Turkish J. Electr. Eng. Comput. Sci. 27(3), 2344–2360 (2019). https://doi.org/10.3906/elk-1806-192
Article Google Scholar
Mitra, G., Sundereisan, S., Sarkar, B.K.: A simple data discretizer. arXiv:1710.05091 19
Tahan, M.H., Ghasemzadeh, M.: An evolutionary multi-objective discretization based on normalized cut. J. AI Data Min. 8(1), 14 (2020). https://doi.org/10.22044/JADM.2019.8507.1989
Article Google Scholar
Liu, H., Jiang, C., Wang, M., Wei, K., Yan, S.: An Improved Data Discretization Algorithm based on Rough Sets Theory, pp. 1432–1437 (2020)
Xun, Y., Yin, Q., Zhang, J., Yang, H., Cui, X.: A novel discretization algorithm based on multi-scale and information entropy. Appl. Intell. 51(2), 991–1009 (2021). https://doi.org/10.1007/s10489-020-01850-w
Article Google Scholar
Alexandre, L., Costa, R.S., Henriques, R.: DI2: Prior-free and multi-item discretization of biological data and its applications. BMC Bioinf. 22(1), 426 (2021). https://doi.org/10.1186/s12859-021-04329-8
Article Google Scholar
Jun, S.: Evolutionary algorithm for improving decision tree with global discretization in manufacturing. Sensors 21(8), 2849 (2021). https://doi.org/10.3390/s21082849
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. Springer, Berlin (2009)
Book Google Scholar
Haddouchi, M., Berrado, A.: A survey of methods and tools used for interpreting Random Forest, pp. 1–6 (2019). https://doi.org/10.1109/ICSSD47982.2019.9002770
Høyland, K., Wallace, S.W.: Generating scenario trees for multistage decision problems. Manage. Sci. 47(2), 295–307 (2001). https://doi.org/10.1287/mnsc.47.2.295.9834
Article MATH Google Scholar
Haddouchi, M., Berrado, A.: Discretizing continuous attributes for machine learning using nonlinear programming. Int. J. Comput. Sci. Appl. 18(1), 26–44 (2021)
Google Scholar
Rouaud, M: Probability, Statistics and Estimation. Propagation of Uncertainties in Experimental Measurement. Short edition edn. Creative Commons (2017)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd ed edn. Wiley-Interscience, Hoboken, N.J, (2006). OCLC: ocm61478842
Johnson, S.G.: The NLopt nonlinear optimization package. http://github.com/stevengj/nlopt
Dubitzky, W., Granzow, M., Berrar, D.P.: Fundamentals of Data Mining in Genomics and Proteomics. Springer, Berlin (2007)
Book Google Scholar
Kaelo, P., Ali, M.M.: Some variants of the controlled random search algorithm for global optimization. J. Optim. Theory Appl. 130(2), 253–264 (2006). https://doi.org/10.1007/s10957-006-9101-0
Article MathSciNet MATH Google Scholar
Price, W.L.: Global optimization by controlled random search. J. Optim. Theory Appl. 40(3), 333–348 (1983). https://doi.org/10.1007/BF00933504
Article MathSciNet MATH Google Scholar
Runarsson, T., Yao, X.: Stochastic ranking for constrained evolutionary optimization. IEEE Trans. Evolut. Comput. 4(3), 284–294 (2000). https://doi.org/10.1109/4235.873238
Article Google Scholar
Runarsson, T., Yao, X.: Search biases in constrained evolutionary optimization. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 35(2), 233–243 (2005). https://doi.org/10.1109/TSMCC.2004.841906
Article Google Scholar
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993). https://doi.org/10.1007/BF00941892
Article MathSciNet MATH Google Scholar
Madsen, K., Zertchaninov, S.: Global Optimization using Branch-and-Bound 17 (1998)
Zertchaninov, S., Madsen, K., Zilinskas, A.: A C++ Programme for Global Optimization. IMM Publications 14 (1998)
Powell, M.: A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Gomez, S., Hennart, J.-P. (eds.) Advances in Optimization and Numerical Analysis, pp. 51–67. Springer, Dordrecht (1994)
Chapter Google Scholar
Powell, M.: Direct search algorithms for optimization calculations. Acta Numerica 7, 287–336 (1998). https://doi.org/10.1017/S0962492900002841
Article MathSciNet MATH Google Scholar
Powell, M.: The BOBYQA algorithm for bound constrained optimization without derivatives. Tech. Rep., Department of Applied Mathematics and Theoretical Physics, Cambridge England, technical report NA2009/06 (2009)
Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 308–313 (1965). https://doi.org/10.1093/comjnl/7.4.308
Article MathSciNet MATH Google Scholar
Box, M.J.: A new method of constrained optimization and a comparison with other methods. Comput. J. 8(1), 42–52 (1965). https://doi.org/10.1093/comjnl/8.1.42
Article MathSciNet MATH Google Scholar
Richardson, J.A., Kuester, J.L.: The complex method for constrained optimization. Commun. ACM 16, 487–489 (1973). https://doi.org/10.1145/355609.362324
Article Google Scholar
Rowan, T.H.: Functional Stability Analysis of Numerical Algorithms. Ph.D. thesis, Ph.D. thesis, Department of Computer Sciences, University of Texas at Austin (1990)
Svanberg, K.: A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J. Optim. 12, 555–573 (2002)
Article MathSciNet Google Scholar
Kraft, D.: A Software Package for Sequential Quadratic Programming Deutsche Forschungs- Und Versuchsanstalt Für Luft- Und Raumfahrt Köln: Forschungsbericht. DFVLR, Wiss. Berichtswesen d (1988)
Kraft, D., Munchen, I.: Algorithm 733: TOMP - Fortran modules for optimal control calculations. ACM Trans. Math. Soft 262–281 (1994)
Nocedal, J.: Updating quasi-newton matrices with limited storage. Math. Comput. 35(773–782), 10 (1980)
MathSciNet MATH Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)
Article MathSciNet Google Scholar
Dembo, R.S., Steihaug, T.: Truncated-newtono algorithms for large-scale unconstrained optimization. Math. Program. 26(2), 190–212 (1983). https://doi.org/10.1007/BF02592055
Article MATH Google Scholar
Vlcek, J., Luksan, L.: Shifted limited-memory variable metric methods for large-scale unconstrained optimization. J. Comput. Appl. Math. 186, 365–390 (2006)
Article MathSciNet Google Scholar
Conn, A.R., Gould, N.I.M., Philippe, Toint, L.: A globally convergent augmented lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 572 (1991)
Birgin, E.G., Martínez, J.M.: Improving ultimate convergence of an Augmented Lagrangian method. Optim. Methods Softw. 23(2), 177–195 (2008)
Article MathSciNet Google Scholar
Louppe, G.: Understanding random forests: from theory to practice. arXiv:1407.7502 [stat] (2015)
Chen, J., et al.: A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 28(4), 919–933. https://doi.org/10.1109/TPDS.2016.2603511, arXiv:1810.07748
Singer, S., Singer, S.: Complexity Analysis of Nelder-Mead Search Iterations, vol. 12. Dubrovnik, Croatia (1999)
MATH Google Scholar
Singer, S., Singer, S.: Efficient implementation of the Nelder-Mead search algorithm. Appl. Numer. Anal. Comput. Math. 1(2), 524–534 (2004). https://doi.org/10.1002/anac.200410015
Article MathSciNet MATH Google Scholar
Galántai, A.: Convergence of the Nelder-Mead method. Numer. Algorithms (2021). https://doi.org/10.1007/s11075-021-01221-7
Article MATH Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing (Vienna, Austria, 2019)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Artif. Intell. 13, 1022–1027 (1993)
Google Scholar
Liu, H., Setiono, R.: Chi2: Feature Selection and Discretization of Numeric Attributes, 388–391 (1995)
Riza, L.S., et al.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “RoughSets’’. Inf. Sci. 287(Complete), 68–89 (2014). https://doi.org/10.1016/j.ins.2014.07.029
Article Google Scholar
von Jouanne-Diedrich, H.: Vonjd/OneR (2017)
Kerber, R.: ChiMerge: Discretization of numeric attributes, AAAI’92, 123–128. AAAI Press, San Jose, California (1992)
Liu, H., Setiono, R.: Feature selection via discretization. IEEE Trans. Knowl. Data Eng. 9(4), 642–645 (1997). https://doi.org/10.1109/69.617056
Article Google Scholar
Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004). https://doi.org/10.1109/TKDE.2004.1269594
Article Google Scholar
Tsai, C.-J., Lee, C.-I., Yang, W.-P.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178(3), 714–731 (2008). https://doi.org/10.1016/j.ins.2007.09.004
Article Google Scholar
Gonzalez-Abril, L., Cuberos, F., Velasco, F., Ortega, J.: Ameva: an autonomous discretization algorithm. Expert Syst. Appl. 36(3), 5327–5332 (2009). https://doi.org/10.1016/j.eswa.2008.06.063
Article Google Scholar
Chao-Ton, Su., Hsu, Jyh-Hwa.: An extended Chi2 algorithm for discretization of real value attributes. IEEE Trans. Knowl. Data Eng. 17(3), 437–441 (2005). https://doi.org/10.1109/TKDE.2005.39
Article Google Scholar
Tay, F., Shen, L.: A modified Chi2 algorithm for discretization. IEEE Trans. Knowl. Data Eng. 14(3), 666–670 (2002). https://doi.org/10.1109/TKDE.2002.1000349
Article Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 32 (1993)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S Fourth, edition Springer Publishing Company, Incorporated, Berlin (2002)
Book Google Scholar
Casas, P.: Discretization based on gain ratio maximization. https://blog.datascienceheroes.com/discretization-recursive-gain-ratio-maximization/ (2019)
Nguyen, H.S.: On efficient handling of continuous attributes in large data bases. Fundam. Inform. 48, 61–81 (2001)
MathSciNet MATH Google Scholar
Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough set algorithms in classification problem. In: Kacprzyk, J., Polkowski, L., Tsumoto, S., Lin, T.Y. (eds.) Rough Set Methods and Applications, vol. 56, pp. 49–88. Physica-Verlag, Heidelberg (2000)
Chapter Google Scholar
Celeux, G., Chauveau, D., Diebolt, J.: On Stochastic Versions of the EM Algorithm. Research Report RR-2514, INRIA (1995)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees the wadsworth statistics/probability series edn. Monterey, CA : Wadsworth & Brooks/Cole Advanced Books & Software, 1984. - 358 p. (1884)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
MathSciNet MATH Google Scholar
Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System, pp. 785–794. ACM Press, San Francisco (2016)
Google Scholar
Samworth, R.J.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012). https://doi.org/10.1214/12-AOS1049.. arXiv:1101.5783
Article MathSciNet MATH Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997). https://doi.org/10.1023/A:1007465528199
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018
Article MATH Google Scholar
Prati, R.C., Monard, M.C.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 1601–1618
He, Haibo, Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Edu. Psychol. Meas. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104
Article Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977). https://doi.org/10.2307/2529310
Article MATH Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80 (1945). https://doi.org/10.2307/3001968
Article Google Scholar
Garcıa, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons 18
Dua, D., Graff, C.: UCI machine learning repository (2017)
Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)
Google Scholar
Marron, J.S., Todd, M.J., Ahn, J.: Distance-weighted discrimination. J. Am. Stat. Assoc. 102(480), 1267–1271 (2007). https://doi.org/10.1198/016214507000001120
Batuwita, R., Palade, V.: Class imbalance learning methods for support vector machines. In: He, H., Ma, Y. (eds.) Imbalanced Learning, pp. 83–99. Wiley, Hoboken (2013)
Chapter Google Scholar

Download references

Funding

Not applicable

Author information

Authors and Affiliations

Mohammed V University in Rabat, Ecole Mohammadia d’Ingénieurs (EMI), Rabat, Morocco
Haddouchi Maissae & Berrado Abdelaziz

Authors

Haddouchi Maissae
View author publications
You can also search for this author inPubMed Google Scholar
Berrado Abdelaziz
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Haddouchi Maissae. The first draft of the manuscript was written by Haddouchi Maissae and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Haddouchi Maissae.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maissae, H., Abdelaziz, B. A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization. Int J Data Sci Anal 14, 45–63 (2022). https://doi.org/10.1007/s41060-022-00316-1

Download citation

Received: 20 September 2021
Accepted: 04 February 2022
Published: 07 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s41060-022-00316-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Tuning ForestDisc Hyperparameters: A Sensitivity Analysis

T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes

Rank Forest: Systematic Attribute Sub-spacing in Decision Forest

Explore related subjects

Availability of data and materials

Code Availability Statement

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now