Abstract
Some numerical attributes may be reduced during discretization. It happens when a discretized attribute has only one interval, i.e., the entire domain of a numerical attribute is mapped into a single interval. The problem is how such reduction of data sets affects the error rate measured by the C4.5 decision tree generation system using ten-fold cross-validation . Our experiments on 15 numerical data sets show that for a Dominant Attribute discretization method the error rate is significantly larger (5% significance level, two-tailed test ) for the reduced data sets. However, decision trees generated from the reduced data sets are significantly simpler than the decision trees generated from the original data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blajdo, P., Grzymala-Busse, J.W., Hippe, Z.S., Knap, M., Mroczek, T. L., Piatek : A comparison of six approaches to discretization—a rough set perspective. In: Proceedings of the Rough Sets and Knowledge Technology Conference, pp. 31–38 (2008)
Bruni, R., Bianchi, G.: Effective classification using a small training set based on discretization and statistical analysis. IEEE Trans. Knowl. Data Eng. 27(9), 2349–2361 (2015)
Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. J. Approx. Reason. 15(4), 319–331 (1996)
Clarke, E.J., Barton, B.A.: Entropy and MDL discretization of continuous variables for bayesian belief networks. Int. J. Intell. Syst. 15, 61–92 (2000)
de Sa, C.R., Soares, C., Knobbe, A.: Entropy-based discretization methods for ranking data. Inf. Sci. 329, 921–936 (2016)
Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8, 87–102 (1992)
Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
Grzymala-Busse, J.W., Mroczek, T.: A comparison of two approaches to discretization: multiple scanning and C4.5. In: Proceedings of the 6-th International Conference on Pattern Recognition and Machine Learning, pp. 44–53 (2015)
Grzymala-Busse, J.W.: A multiple scanning strategy for entropy based discretization. In: Proceedings of the 18th International Symposium on Methodologies for Intelligent Systems, pp. 25–34 (2009)
Grzymala-Busse, J.W.: Discretization of numerical attributes. In: Kloesgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 218–225. Oxford University Press, New York, NY (2002)
Grzymala-Busse, J.W.: Discretization based on entropy and multiple scanning. Entropy 15, 1486–1502 (2013)
Grzymala-Busse, J.W., Mroczek, T.: A comparison of four approaches to discretization based on entropy. Entropy 18, 1–11 (2016)
Jiang, F., Sui, Y.: A novel approach for discretization of continuous attributes in rough set theory. Knowl. Based Syst. 73, 324–334 (2015)
Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 114–119 (1996)
Nguyen, H.S., Nguyen, S.H.: Discretization methods in data mining. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 1: Methodology and Applications, pp. 451–482. Physica-Verlag, Heidelberg (1998)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Rahman, M.D., Islam, M.Z.: Discretization of continuous attributes through low frequency numerical values and attribute interdependency. Expert Syst. Appl. 45, 410–423 (2016)
Sang, Y., Qi, H., Li, K., Jin, Y., Yan, D., Gao, S.: An effective discretization method for disposing high-dimensional data. Inf. Sci. 270, 73–91 (2014)
Stefanowski, J.: Handling continuous attributes in discovery of strong decision 0 rules. In: Proceedings of the First Conference on Rough Sets and Current Trends in Computing, pp. 394–401 (1998)
Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznań University of Technology Press, Poznań, Poland (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Grzymała-Busse, J.W., Mroczek, T. (2018). Attribute Selection Based on Reduction of Numerical Attributes During Discretization. In: Stańczyk, U., Zielosko, B., Jain, L. (eds) Advances in Feature Selection for Data and Pattern Recognition. Intelligent Systems Reference Library, vol 138. Springer, Cham. https://doi.org/10.1007/978-3-319-67588-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-67588-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67587-9
Online ISBN: 978-3-319-67588-6
eBook Packages: EngineeringEngineering (R0)