Abstract
This paper introduces ForestDisc, an optimized, supervised, multivariate, and nonparametric discretization algorithm based on tree ensemble learning and moment matching optimization. At its core, ForestDisc uses, for each continuous attribute in the data space, moment matching to elect popular split points based on those generated while constructing a random forest model. An extensive empirical study involving 50 benchmark datasets and six classification algorithms reveals that ForestDisc is highly competitive compared with 20 major discretizers based on both intrinsic and extrinsic performance measures. The intrinsic metrics include the number of resulting bins per variable and the execution time necessary for discretizing an attribute. The extrinsic metrics concern the performance of the discretizers when applied as a preprocessing step to classification tasks, and include accuracy, F1, and Kappa measures. ForestDisc discretizer also enables an excellent trade-off between intrinsic and extrinsic performance measures.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
All data used are publicly available in UCI Machine Learning repository and Keel datasets repository.
Code Availability Statement
The implementation and the computational work are done using the R language and environment for statistical computing. The code, the data files, and the resulting files of the benchmark reported in the article are available via https://github.com/HMAISSAE/ForestDisc_Bench.git.
References
Frank, E., Witten, I.H.: Making better use of global discretization, 115–123 (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Conference held at Bled, Slovenia, to 1999-06-30)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6, 393–423 (2002)
Lustgarten, J.L., Gopalakrishnan, V., Grover, H., Visweswaran, S.: Improving classification performance with discretization on biomedical datasets. AMIA Annu. Symp. Proc. 2008, 445–449 (2008)
Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 101–116. Springer, Boston (2010)
Vorobeva, A.A.: Influence of features discretization on accuracy of random forest classifier for web user identification. IEEE, St-Petersburg, Russia, 498–504 (2017)
Berrado, A., Runger, G.C.: Using metarules to organize and group discovered association rules. Data Min. Knowl. Discov. 14(3), 409–431 (2007). https://doi.org/10.1007/s10618-006-0062-6
Azmi, M., Runger, G.C., Berrado, A.: Interpretable regularized class association rules algorithm for classification in a categorical data space. Inf. Sci. 483, 313–331 (2019). https://doi.org/10.1016/j.ins.2019.01.047
Deng, H.: Interpreting tree ensembles with inTrees. Int. J. Data Sci. Anal. 7(4), 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features, pp. 194–202. Elsevier, Amsterdam (1995)
Ramırez-Gallego, S., Garcıa, S., Martınez-Rego, D., Benıtez, J. M., Herrera, F.: Data Discretization: Taxonomy and Big Data Challenge 26
Agre, G.: On supervised and unsupervised discretization. Cybern. Inf. Technol. (2002)
Ching, J., Wong, A., Chan, K.: Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 641–651 (1995). https://doi.org/10.1109/34.391407
Wang, C., Wang, M., She, Z., Cao, L., Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J.: CD: a coupled discretization algorithm. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, pp. 407–418. Springer, Berlin (2012)
Wong, A.K.C., Chiu, D.K.Y.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. PAMI9(6), 796–805 (1987). https://doi.org/10.1109/TPAMI.1987.4767986
Ali, R., Siddiqi, M.H., Lee, S.: Rough set-based approaches for discretization: a compact review. Artif. Intell. Rev. 44(2), 235–263 (2015). https://doi.org/10.1007/s10462-014-9426-2
Mehta, S., Parthasarathy, S., Yang, H.: Toward unsupervised correlation preserving discretization. IEEE Trans. Knowl. Data Eng. 17(9), 1174–1185 (2005). https://doi.org/10.1109/TKDE.2005.153
Muhlenbach, F., Rakotomalala, R.: Discretization of Continuous Attributes Idea group reference edn hal-00383757v2, 397–402 (2005)
Garcia, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013). https://doi.org/10.1109/TKDE.2012.35
Berrado, A., Runger, G.C.: Supervised multivariate discretization in mixed data with Random Forests. IEEE, Rabat, Morocco, pp. 211–217 (2009)
Haddouchi, M., Berrado, A.: An implementation of a multivariate discretization for supervised learning using Forestdisc 1–6 (2020). https://doi.org/10.1145/3419604.3419772
Haddouchi, M.: ForestDisc: Forest Discretization. R package version 0.1.0. https://CRAN.R-project.org/package=ForestDisc (2020)
Sriwanna, K., Puntumapon, K., Waiyamai, K., Zhou, S., Zhang, S., Karypis, G.: An enhanced class-attribute interdependence maximization discretization algorithm. In: Zhou, S., Zhang, S., Karypis, G. (eds.) Advanced Data Mining and Applications. Lecture Notes in Computer Science, pp. 465–476. Springer, Berlin (2012)
Kurtcephe, M., Güvenir, H.A.: A discretization method based on maximizing the area under receiver operating characteristic curve. Int. J. Pattern Recognit. Artif. Intell. 27(01), 1350002 (2013). https://doi.org/10.1142/S021800141350002X
Baka, A., Wettayaprasit, W., Vanichayobon, S.: A novel discretization technique using Class Attribute Interval Average, pp. 95–100 (2014)
Yan, D., Liu, D., Sang, Y.: A new approach for discretizing continuous attributes in learning systems. Neurocomputing 133, 507–511 (2014). https://doi.org/10.1016/j.neucom.2013.12.005
Sang, Y., et al.: An effective discretization method for disposing high-dimensional data. Inf. Sci. 270, 73–91 (2014). https://doi.org/10.1016/j.ins.2014.02.113
Huang, W., Pan, Y., Wu, J.: Supervised discretization for optimal prediction. Procedia Comput. Sci. 30, 75–80 (2014). https://doi.org/10.1016/j.procs.2014.05.383
CanoAlberto, T.N., VenturaSebastián, JC.: Ur-CAIM. Soft Computing - A Fusion of Foundations, Methodologies and Applications (2016)
Ramírez-Gallego, S., García, S., Benítez, J. M., Herrera, F., Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A.: A Wrapper evolutionary approach for supervised multivariate discretization: a case study on decision trees. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Advances in Intelligent Systems and Computing. Springer, Cham, pp. 47–58 (2016)
Sriwanna, K., Boongoen, T., Iam-On, N. Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J.H.: An enhanced univariate discretization based on cluster ensembles. In:Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. H. (eds) Proceedings in Adaptation, Learning and Optimization, Intelligent and Evolutionary Systems. Springer, Cham, pp. 85–98 (2016)
Khanmohammadi, S., Chou, C.-A.: A Gaussian mixture model based discretization algorithm for associative classification of medical data. Expert Syst. Appl. 58, 119–129 (2016). https://doi.org/10.1016/j.eswa.2016.03.046
Geaur Rahman, M., Zahidul Islam, M.: Discretization of continuous attributes through low frequency numerical values and attribute interdependency. Expert Syst. Appl. 45, 410–423 (2016). https://doi.org/10.1016/j.eswa.2015.10.005
Qiu, Q., Huang, W.: Forward supervised discretization for multivariate with categorical responses. Big Data Inf. Anal. 1(2/3), 217–225 (2016). https://doi.org/10.3934/bdia.2016005
Wen, L.-Y., Min, F., Wang, S.-Y.: A two-stage discretization algorithm based on information entropy. Appl. Intell. 47(4), 1169–1185 (2017). https://doi.org/10.1007/s10489-017-0941-0
Sriwanna, K., Boongoen, T., Iam-On, N.: Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM). Human-Centric Comput. Inf. Sci. 7(1), 21 (2017). https://doi.org/10.1186/s13673-017-0103-8
Tahan, M.H., Asadi, S.: MEMOD: a novel multivariate evolutionary multi-objective discretization. Soft Comput. 22(1), 301–323 (2018). https://doi.org/10.1007/s00500-016-2475-5
Hacibeyoglu, M., Ibrahim, M.H.: EFunique: an improved version of unsupervised equal frequency discretization method. Arabian J. Sci. Eng. 43(12), 7695–7704 (2018). https://doi.org/10.1007/s13369-018-3144-z
Ehrhardt, A., Vandewalle, V., Biernacki, C., Heinrich, P.: Supervised multivariate discretization and levels merging for logistic regression. Iasi, Romania (2018)
Drias, H., Moulai, H., Rehkab, N.: LR-SDiscr: an efficient algorithm for supervised discretization. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) Intelligent Information and Database Systems, vol. 10751, pp. 266–275. Springer, Cham (2018)
Abachi, H.M., Hosseini, S., Maskouni, M.A., Kangavari, M., Cheung, N.-M., Wang, J., Cong, G., Chen, J., Qi, J.: Statistical discretization of continuous attributes using Kolmogorov-Smirnov test. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds.) Databases Theory and Applications. Lecture Notes in Computer Science, pp. 309–315. Springer, Cham (2018)
Flores, J.L., Calvo, B., Perez, A.: Supervised non-parametric discretization based on Kernel density estimation. Pattern Recognit. Lett. 128, 496–504 (2019). https://doi.org/10.1016/j.patrec.2019.10.016
Mutlu, A., Göz, F., Akbulut, O.: lFIT: an unsupervised discretization method based on the Ramer–Douglas–Peucker algorithm. Turkish J. Electr. Eng. Comput. Sci. 27(3), 2344–2360 (2019). https://doi.org/10.3906/elk-1806-192
Mitra, G., Sundereisan, S., Sarkar, B.K.: A simple data discretizer. arXiv:1710.05091 19
Tahan, M.H., Ghasemzadeh, M.: An evolutionary multi-objective discretization based on normalized cut. J. AI Data Min. 8(1), 14 (2020). https://doi.org/10.22044/JADM.2019.8507.1989
Liu, H., Jiang, C., Wang, M., Wei, K., Yan, S.: An Improved Data Discretization Algorithm based on Rough Sets Theory, pp. 1432–1437 (2020)
Xun, Y., Yin, Q., Zhang, J., Yang, H., Cui, X.: A novel discretization algorithm based on multi-scale and information entropy. Appl. Intell. 51(2), 991–1009 (2021). https://doi.org/10.1007/s10489-020-01850-w
Alexandre, L., Costa, R.S., Henriques, R.: DI2: Prior-free and multi-item discretization of biological data and its applications. BMC Bioinf. 22(1), 426 (2021). https://doi.org/10.1186/s12859-021-04329-8
Jun, S.: Evolutionary algorithm for improving decision tree with global discretization in manufacturing. Sensors 21(8), 2849 (2021). https://doi.org/10.3390/s21082849
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. Springer, Berlin (2009)
Haddouchi, M., Berrado, A.: A survey of methods and tools used for interpreting Random Forest, pp. 1–6 (2019). https://doi.org/10.1109/ICSSD47982.2019.9002770
Høyland, K., Wallace, S.W.: Generating scenario trees for multistage decision problems. Manage. Sci. 47(2), 295–307 (2001). https://doi.org/10.1287/mnsc.47.2.295.9834
Haddouchi, M., Berrado, A.: Discretizing continuous attributes for machine learning using nonlinear programming. Int. J. Comput. Sci. Appl. 18(1), 26–44 (2021)
Rouaud, M: Probability, Statistics and Estimation. Propagation of Uncertainties in Experimental Measurement. Short edition edn. Creative Commons (2017)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd ed edn. Wiley-Interscience, Hoboken, N.J, (2006). OCLC: ocm61478842
Johnson, S.G.: The NLopt nonlinear optimization package. http://github.com/stevengj/nlopt
Dubitzky, W., Granzow, M., Berrar, D.P.: Fundamentals of Data Mining in Genomics and Proteomics. Springer, Berlin (2007)
Kaelo, P., Ali, M.M.: Some variants of the controlled random search algorithm for global optimization. J. Optim. Theory Appl. 130(2), 253–264 (2006). https://doi.org/10.1007/s10957-006-9101-0
Price, W.L.: Global optimization by controlled random search. J. Optim. Theory Appl. 40(3), 333–348 (1983). https://doi.org/10.1007/BF00933504
Runarsson, T., Yao, X.: Stochastic ranking for constrained evolutionary optimization. IEEE Trans. Evolut. Comput. 4(3), 284–294 (2000). https://doi.org/10.1109/4235.873238
Runarsson, T., Yao, X.: Search biases in constrained evolutionary optimization. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 35(2), 233–243 (2005). https://doi.org/10.1109/TSMCC.2004.841906
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993). https://doi.org/10.1007/BF00941892
Madsen, K., Zertchaninov, S.: Global Optimization using Branch-and-Bound 17 (1998)
Zertchaninov, S., Madsen, K., Zilinskas, A.: A C++ Programme for Global Optimization. IMM Publications 14 (1998)
Powell, M.: A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Gomez, S., Hennart, J.-P. (eds.) Advances in Optimization and Numerical Analysis, pp. 51–67. Springer, Dordrecht (1994)
Powell, M.: Direct search algorithms for optimization calculations. Acta Numerica 7, 287–336 (1998). https://doi.org/10.1017/S0962492900002841
Powell, M.: The BOBYQA algorithm for bound constrained optimization without derivatives. Tech. Rep., Department of Applied Mathematics and Theoretical Physics, Cambridge England, technical report NA2009/06 (2009)
Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 308–313 (1965). https://doi.org/10.1093/comjnl/7.4.308
Box, M.J.: A new method of constrained optimization and a comparison with other methods. Comput. J. 8(1), 42–52 (1965). https://doi.org/10.1093/comjnl/8.1.42
Richardson, J.A., Kuester, J.L.: The complex method for constrained optimization. Commun. ACM 16, 487–489 (1973). https://doi.org/10.1145/355609.362324
Rowan, T.H.: Functional Stability Analysis of Numerical Algorithms. Ph.D. thesis, Ph.D. thesis, Department of Computer Sciences, University of Texas at Austin (1990)
Svanberg, K.: A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J. Optim. 12, 555–573 (2002)
Kraft, D.: A Software Package for Sequential Quadratic Programming Deutsche Forschungs- Und Versuchsanstalt Für Luft- Und Raumfahrt Köln: Forschungsbericht. DFVLR, Wiss. Berichtswesen d (1988)
Kraft, D., Munchen, I.: Algorithm 733: TOMP - Fortran modules for optimal control calculations. ACM Trans. Math. Soft 262–281 (1994)
Nocedal, J.: Updating quasi-newton matrices with limited storage. Math. Comput. 35(773–782), 10 (1980)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)
Dembo, R.S., Steihaug, T.: Truncated-newtono algorithms for large-scale unconstrained optimization. Math. Program. 26(2), 190–212 (1983). https://doi.org/10.1007/BF02592055
Vlcek, J., Luksan, L.: Shifted limited-memory variable metric methods for large-scale unconstrained optimization. J. Comput. Appl. Math. 186, 365–390 (2006)
Conn, A.R., Gould, N.I.M., Philippe, Toint, L.: A globally convergent augmented lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 572 (1991)
Birgin, E.G., Martínez, J.M.: Improving ultimate convergence of an Augmented Lagrangian method. Optim. Methods Softw. 23(2), 177–195 (2008)
Louppe, G.: Understanding random forests: from theory to practice. arXiv:1407.7502 [stat] (2015)
Chen, J., et al.: A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 28(4), 919–933. https://doi.org/10.1109/TPDS.2016.2603511, arXiv:1810.07748
Singer, S., Singer, S.: Complexity Analysis of Nelder-Mead Search Iterations, vol. 12. Dubrovnik, Croatia (1999)
Singer, S., Singer, S.: Efficient implementation of the Nelder-Mead search algorithm. Appl. Numer. Anal. Comput. Math. 1(2), 524–534 (2004). https://doi.org/10.1002/anac.200410015
Galántai, A.: Convergence of the Nelder-Mead method. Numer. Algorithms (2021). https://doi.org/10.1007/s11075-021-01221-7
R Core Team: R: A Language and Environment for Statistical Computing (Vienna, Austria, 2019)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Artif. Intell. 13, 1022–1027 (1993)
Liu, H., Setiono, R.: Chi2: Feature Selection and Discretization of Numeric Attributes, 388–391 (1995)
Riza, L.S., et al.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “RoughSets’’. Inf. Sci. 287(Complete), 68–89 (2014). https://doi.org/10.1016/j.ins.2014.07.029
von Jouanne-Diedrich, H.: Vonjd/OneR (2017)
Kerber, R.: ChiMerge: Discretization of numeric attributes, AAAI’92, 123–128. AAAI Press, San Jose, California (1992)
Liu, H., Setiono, R.: Feature selection via discretization. IEEE Trans. Knowl. Data Eng. 9(4), 642–645 (1997). https://doi.org/10.1109/69.617056
Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004). https://doi.org/10.1109/TKDE.2004.1269594
Tsai, C.-J., Lee, C.-I., Yang, W.-P.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178(3), 714–731 (2008). https://doi.org/10.1016/j.ins.2007.09.004
Gonzalez-Abril, L., Cuberos, F., Velasco, F., Ortega, J.: Ameva: an autonomous discretization algorithm. Expert Syst. Appl. 36(3), 5327–5332 (2009). https://doi.org/10.1016/j.eswa.2008.06.063
Chao-Ton, Su., Hsu, Jyh-Hwa.: An extended Chi2 algorithm for discretization of real value attributes. IEEE Trans. Knowl. Data Eng. 17(3), 437–441 (2005). https://doi.org/10.1109/TKDE.2005.39
Tay, F., Shen, L.: A modified Chi2 algorithm for discretization. IEEE Trans. Knowl. Data Eng. 14(3), 666–670 (2002). https://doi.org/10.1109/TKDE.2002.1000349
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 32 (1993)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S Fourth, edition Springer Publishing Company, Incorporated, Berlin (2002)
Casas, P.: Discretization based on gain ratio maximization. https://blog.datascienceheroes.com/discretization-recursive-gain-ratio-maximization/ (2019)
Nguyen, H.S.: On efficient handling of continuous attributes in large data bases. Fundam. Inform. 48, 61–81 (2001)
Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough set algorithms in classification problem. In: Kacprzyk, J., Polkowski, L., Tsumoto, S., Lin, T.Y. (eds.) Rough Set Methods and Applications, vol. 56, pp. 49–88. Physica-Verlag, Heidelberg (2000)
Celeux, G., Chauveau, D., Diebolt, J.: On Stochastic Versions of the EM Algorithm. Research Report RR-2514, INRIA (1995)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees the wadsworth statistics/probability series edn. Monterey, CA : Wadsworth & Brooks/Cole Advanced Books & Software, 1984. - 358 p. (1884)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System, pp. 785–794. ACM Press, San Francisco (2016)
Samworth, R.J.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012). https://doi.org/10.1214/12-AOS1049.. arXiv:1101.5783
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997). https://doi.org/10.1023/A:1007465528199
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018
Prati, R.C., Monard, M.C.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 1601–1618
He, Haibo, Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Cohen, J.: A coefficient of agreement for nominal scales. Edu. Psychol. Meas. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977). https://doi.org/10.2307/2529310
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80 (1945). https://doi.org/10.2307/3001968
Garcıa, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons 18
Dua, D., Graff, C.: UCI machine learning repository (2017)
Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)
Marron, J.S., Todd, M.J., Ahn, J.: Distance-weighted discrimination. J. Am. Stat. Assoc. 102(480), 1267–1271 (2007). https://doi.org/10.1198/016214507000001120
Batuwita, R., Palade, V.: Class imbalance learning methods for support vector machines. In: He, H., Ma, Y. (eds.) Imbalanced Learning, pp. 83–99. Wiley, Hoboken (2013)
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Haddouchi Maissae. The first draft of the manuscript was written by Haddouchi Maissae and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Maissae, H., Abdelaziz, B. A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization. Int J Data Sci Anal 14, 45–63 (2022). https://doi.org/10.1007/s41060-022-00316-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-022-00316-1