Using Resampling Techniques for Better Quality Discretization

Qureshi, Taimur; Zighed, Djamel A.

doi:10.1007/978-3-642-03070-3_6

Taimur Qureshi²⁰ &
Djamel A. Zighed²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2438 Accesses

Abstract

Many supervised induction algorithms require discrete data, however real data often comes in both discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction model. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce two variants of a resampling technique (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether this type of resampling can lead to better quality discretization points, which opens up a new paradigm to construction of soft decision trees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Extension of Windowing as a Learning Technique in Artificial Noisy Domains

Mining Data with Many Missing Attribute Values Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets

Discretizing Numerical Attributes: An Analysis of Human Perceptions

References

Zighed, D.A., Rabaséda, S., Rakotomalala, R.: Discretization Methods in Supervised Learning. Encyclopedia of Computer Science and Technology 40, 35–45 (1998)
MATH Google Scholar
Wehenkel, L.: An Information Quality Based Decision Tree Pruning Method. In: Valverde, L., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 1992. LNCS, vol. 682. Springer, Heidelberg (1993)
Google Scholar
Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning 8, 87–102 (1992)
MATH Google Scholar
Kerber, R.: Discretization of Numeric Attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128. MIT Press, Cambridge (1992)
Google Scholar
Zighed, D.A., Rakotomalala, R., Rabaséda, S.: Discretization Method for Continuous Attributes in Induction Graphs. In: Proceeding of the 13th European Meetings on Cybernetics and System Research, pp. 997–1002 (1996)
Google Scholar
Fayyad, U.M., Irani, K.: Multi-interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Zighed, D.A., Rickotomalala, R.: A Method for Non Arborescent Induction Graphs. Technical Report, Laboratory ERIC, University of Lyon 2 (1996)
Google Scholar
Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Trans. on Pattern Analysis and Machine Intelligence 17(7), 641–651 (1995)
Article Google Scholar
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
MATH Google Scholar
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman and Hall, Boca Raton (1998)
MATH Google Scholar
Yang, Y., Webb, G.I.: Discretization for naive-bayes learning: managing discretization bias and variance. Technical Report 2003/131, School of Computer Science and Software Engineering, Monash University (2003)
Google Scholar
Hsu, C.N., Huang, H.J., Wong, T.T.: Why discretization works for naive Bayesian classifiers. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 309–406 (2000)
Google Scholar
MODL: A Bayes optimal discretization method for continuous attributes. Journal of Machine Learning, 131–165 (2006)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Zighed, D.A., Rabaseda, S., Rakotomalala, R.: Fusinter: a method for discretization of continuous attributes for supervised learning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(33), 307–326 (1998)
Article MATH Google Scholar
Geurts, P., Wehenkel, L.: Investigation and reduction of discretization variance in decision tree induction. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS, vol. 1810, pp. 162–170. Springer, Heidelberg (2000)
Chapter Google Scholar
Chmielewski, M.R., Grzymala Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. In: Third International Workshop on Rough Sets and Soft Computing, pp. 294–301 (1994)
Google Scholar
Peng, Y., Flach, P.: Soft Discretization to Enhance the Continuous Decision Tree Induction. In: Giraud-Carrier, C., Lavrac, N., Moyle, S. (eds.) Integrating Aspects of Data Mining, Decision Support and Meta-Learning, September 2001. ECML/PKDD 2001 workshop notes, pp. 109–118 (2001)
Google Scholar
Fischer, W.D.: On grouping for maximum of homogeneity. Journal of the American Statistical Association 53, 789–798 (1958)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory ERIC, University of Lyon 2, 5, Avenue Pierre Mendes, 69676, Bron Cedex, France
Taimur Qureshi & Djamel A. Zighed

Authors

Taimur Qureshi
View author publications
You can also search for this author in PubMed Google Scholar
Djamel A. Zighed
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qureshi, T., Zighed, D.A. (2009). Using Resampling Techniques for Better Quality Discretization. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-03070-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03069-7
Online ISBN: 978-3-642-03070-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics