Skip to main content

Attribute Selection Based on Reduction of Numerical Attributes During Discretization

  • Chapter
  • First Online:
Advances in Feature Selection for Data and Pattern Recognition

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 138))

  • 1274 Accesses

Abstract

Some numerical attributes may be reduced during discretization. It happens when a discretized attribute has only one interval, i.e., the entire domain of a numerical attribute is mapped into a single interval. The problem is how such reduction of data sets affects the error rate measured by the C4.5 decision tree generation system using ten-fold cross-validation . Our experiments on 15 numerical data sets show that for a Dominant Attribute discretization method the error rate is significantly larger (5% significance level, two-tailed test ) for the reduced data sets. However, decision trees generated from the reduced data sets are significantly simpler than the decision trees generated from the original data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blajdo, P., Grzymala-Busse, J.W., Hippe, Z.S., Knap, M., Mroczek, T. L., Piatek : A comparison of six approaches to discretization—a rough set perspective. In: Proceedings of the Rough Sets and Knowledge Technology Conference, pp. 31–38 (2008)

    Google Scholar 

  2. Bruni, R., Bianchi, G.: Effective classification using a small training set based on discretization and statistical analysis. IEEE Trans. Knowl. Data Eng. 27(9), 2349–2361 (2015)

    Article  Google Scholar 

  3. Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. J. Approx. Reason. 15(4), 319–331 (1996)

    Article  MATH  Google Scholar 

  4. Clarke, E.J., Barton, B.A.: Entropy and MDL discretization of continuous variables for bayesian belief networks. Int. J. Intell. Syst. 15, 61–92 (2000)

    Article  Google Scholar 

  5. de Sa, C.R., Soares, C., Knobbe, A.: Entropy-based discretization methods for ranking data. Inf. Sci. 329, 921–936 (2016)

    Article  Google Scholar 

  6. Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)

    Article  MATH  Google Scholar 

  7. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  8. Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8, 87–102 (1992)

    MATH  Google Scholar 

  9. Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)

    Google Scholar 

  10. Grzymala-Busse, J.W., Mroczek, T.: A comparison of two approaches to discretization: multiple scanning and C4.5. In: Proceedings of the 6-th International Conference on Pattern Recognition and Machine Learning, pp. 44–53 (2015)

    Google Scholar 

  11. Grzymala-Busse, J.W.: A multiple scanning strategy for entropy based discretization. In: Proceedings of the 18th International Symposium on Methodologies for Intelligent Systems, pp. 25–34 (2009)

    Google Scholar 

  12. Grzymala-Busse, J.W.: Discretization of numerical attributes. In: Kloesgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 218–225. Oxford University Press, New York, NY (2002)

    Google Scholar 

  13. Grzymala-Busse, J.W.: Discretization based on entropy and multiple scanning. Entropy 15, 1486–1502 (2013)

    Article  MathSciNet  Google Scholar 

  14. Grzymala-Busse, J.W., Mroczek, T.: A comparison of four approaches to discretization based on entropy. Entropy 18, 1–11 (2016)

    Article  Google Scholar 

  15. Jiang, F., Sui, Y.: A novel approach for discretization of continuous attributes in rough set theory. Knowl. Based Syst. 73, 324–334 (2015)

    Article  Google Scholar 

  16. Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 114–119 (1996)

    Google Scholar 

  17. Nguyen, H.S., Nguyen, S.H.: Discretization methods in data mining. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 1: Methodology and Applications, pp. 451–482. Physica-Verlag, Heidelberg (1998)

    Google Scholar 

  18. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  19. Rahman, M.D., Islam, M.Z.: Discretization of continuous attributes through low frequency numerical values and attribute interdependency. Expert Syst. Appl. 45, 410–423 (2016)

    Article  Google Scholar 

  20. Sang, Y., Qi, H., Li, K., Jin, Y., Yan, D., Gao, S.: An effective discretization method for disposing high-dimensional data. Inf. Sci. 270, 73–91 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  21. Stefanowski, J.: Handling continuous attributes in discovery of strong decision 0 rules. In: Proceedings of the First Conference on Rough Sets and Current Trends in Computing, pp. 394–401 (1998)

    Google Scholar 

  22. Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznań University of Technology Press, Poznań, Poland (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy W. Grzymała-Busse .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Grzymała-Busse, J.W., Mroczek, T. (2018). Attribute Selection Based on Reduction of Numerical Attributes During Discretization. In: Stańczyk, U., Zielosko, B., Jain, L. (eds) Advances in Feature Selection for Data and Pattern Recognition. Intelligent Systems Reference Library, vol 138. Springer, Cham. https://doi.org/10.1007/978-3-319-67588-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67588-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67587-9

  • Online ISBN: 978-3-319-67588-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics