Mining Numerical Data – A Rough Set Approach

Grzymala-Busse, Jerzy W.

doi:10.1007/978-3-642-11479-3_1

Jerzy W. Grzymala-Busse^18,19

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 5946))

422 Accesses
11 Citations

Abstract

We present an approach to mining numerical data based on rough set theory using calculus of attribute-value blocks. An algorithm implementing these ideas, called MLEM2, induces high quality rules in terms of both simplicity (number of rules and total number of conditions) and accuracy. MLEM2 induces rules not only from complete data sets but also from data with missing attribute values, with or without numerical attributes. Additionally, we present experimental results on a comparison of three commonly used discretization techniques: equal interval width, equal interval frequency and minimal class entropy (all three methods were combined with the LEM2 rule induction algorithm) with MLEM2. Our conclusion is that even though MLEM2 was most frequently a winner, the differences between all four data mining methods are statistically insignificant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rough Sets in Machine Learning: A Review

Rseslib 3: Open Source Library of Rough Set and Machine Learning Methods

Rough-Set-Base Data Analysis: Theoretical Basis and Applications

References

Bajcar, S., Grzymala-Busse, J.W., Hippe, Z.S.: A comparison of six discretization algorithms used for prediction of melanoma. In: Proc. of the Eleventh International Symposium on Intelligent Information Systems, IIS 2002, Sopot, Poland, pp. 3–12. Physica-Verlag (2002)
Google Scholar
Bay, S.D.: Multivariate discretization of continuous variables for set mining. In: Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, MA, pp. 315–319 (2000)
Google Scholar
Biba, M., Esposito, F., Ferilli, S., Mauro, N.D., Basile, T.M.A.: Unsupervised discretization using kernel density estimation. In: Proc. of the 20th Int. Conf. on AI, Hyderabad, India, pp. 696–701 (2007)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks, Monterey (1984)
MATH Google Scholar
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS (LNAI), vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Chapter Google Scholar
Chan, C.C., Batur, C., Srinivasan, A.: Determination of quantization intervals in rule based model for dynamic systems. In: Proc. of the IEEE Conference on Systems, Man, and Cybernetics, Charlottesville, VA, pp. 1719–1723 (1991)
Google Scholar
Chan, C.C., Grzymala-Busse, J.W.: On the attribute redundancy and the learning programs ID3, PRISM, and LEM2. Department of Computer Science, University of Kansas, TR-91-14, December 1991, 20 p. (1991)
Google Scholar
Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. Journal of Approximate Reasoning 15, 319–331 (1996)
Article MATH Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc of the 12th Int. Conf. on Machine Learning, Tahoe City, CA, July 9–12, pp. 194–202 (1995)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of the 13th Int. Joint Conference on AI, Chambery, France, pp. 1022–1027 (1993)
Google Scholar
Grzymala-Busse, J.W.: LERS—A system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992)
Google Scholar
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)
MATH Google Scholar
Grzymala-Busse, J.W.: Discretization of numerical attributes. In: Klösgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 218–225. Oxford University Press, New York (2002)
Google Scholar
Grzymala-Busse, J.W.: MLEM2: A new algorithm for rule induction from imperfect data. In: Proc. of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, Annecy, France, pp. 243–250 (2002)
Google Scholar
Grzymala-Busse, J.W.: A comparison of three strategies to rule induction from data with numerical attributes. In: Proc. of the Int. Workshop on Rough Sets in Knowledge Discovery (RSKD 2003), in conjunction with the European Joint Conferences on Theory and Practice of Software, Warsaw, pp. 132–140 (2003)
Google Scholar
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3rd International Conference on Data Mining, Melbourne, FL, pp. 56–63 (2003)
Google Scholar
Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B.z., Świniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 78–95. Springer, Heidelberg (2004)
Google Scholar
Grzymala-Busse, J.W.: Incomplete data and generalization of indiscernibility relation, definability, and approximations. In: Ślęzak, D., Wang, G., Szczuka, M.S., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3641, pp. 244–253. Springer, Heidelberg (2005)
Chapter Google Scholar
Grzymala-Busse, J.W.: Mining numerical data—A rough set approach. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 12–21. Springer, Heidelberg (2007)
Chapter Google Scholar
Grzymala-Busse, J.W., Stefanowski, J.: Discretization of numerical attributes by direct use of the rule induction algorithm LEM2 with interval extension. In: Proc. of the Sixth Symposium on Intelligent Information Systems (IIS 1997), Zakopane, Poland, pp. 149–158 (1997)
Google Scholar
Grzymala-Busse, J.W., Stefanowski, J.: Three discretization methods for rule induction. Int. Journal of Intelligent Systems 16, 29–38 (2001)
Article MATH Google Scholar
Gunn, J.D., Grzymala-Busse, J.W.: Global temperature stability by rule induction: An interdisciplinary bridge. Human Ecology 22, 59–81 (1994)
Article Google Scholar
Kerber, R.: ChiMerge: Discretization of numeric attributes. In: Proc. of the 10th National Conf. on AI, San Jose, CA, pp. 123–128 (1992)
Google Scholar
Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Proc of the 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 114–119 (1996)
Google Scholar
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)
Article MathSciNet Google Scholar
Nguyen, H.S., Nguyen, S.H.: Discretization methods for data mining. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery, pp. 451–482. Physica, Heidelberg (1998)
Google Scholar
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MATH MathSciNet Google Scholar
Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Proc. of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics, pp. 24–30 (2004)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Stefanowski, J.: Handling continuous attributes in discovery of strong decision rules. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 394–401. Springer, Heidelberg (1998)
Chapter Google Scholar
Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, 66045, USA
Jerzy W. Grzymala-Busse
Institute of Computer Science Polish Academy of Sciences, 01-237, Warsaw, Poland
Jerzy W. Grzymala-Busse

Authors

Jerzy W. Grzymala-Busse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Manitoba, Department of Electrical and Computer Engineering, , , ,, Winnipeg, R3T 5V6, Manitoba, Canada
James F. Peters
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grzymala-Busse, J.W. (2010). Mining Numerical Data – A Rough Set Approach. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets XI. Lecture Notes in Computer Science, vol 5946. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11479-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-11479-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11478-6
Online ISBN: 978-3-642-11479-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mining Numerical Data – A Rough Set Approach

Abstract

Access this chapter

Preview

Similar content being viewed by others

Rough Sets in Machine Learning: A Review

Rseslib 3: Open Source Library of Rough Set and Machine Learning Methods

Rough-Set-Base Data Analysis: Theoretical Basis and Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Mining Numerical Data – A Rough Set Approach

Abstract

Access this chapter

Preview

Similar content being viewed by others

Rough Sets in Machine Learning: A Review

Rseslib 3: Open Source Library of Rough Set and Machine Learning Methods

Rough-Set-Base Data Analysis: Theoretical Basis and Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation