Abstract
We present an approach to mining numerical data based on rough set theory using calculus of attribute-value blocks. An algorithm implementing these ideas, called MLEM2, induces high quality rules in terms of both simplicity (number of rules and total number of conditions) and accuracy. Additionally, MLEM2 induces rules not only from complete data sets but also from data with missing attribute values, with or without numerical attributes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bajcar, S., Grzymala-Busse, J.W., Hippe, Z.S.: A comparison of six discretization algorithms used for prediction of melanoma. In: IIS’2002. Proc. of the Eleventh International Symposium on Intelligent Information Systems, Sopot, Poland, pp. 3–12. Physica-Verlag, Heidelberg (2003)
Bay, S.D.: Multivariate discretization of continous variables for set mining. In: Proc. of the 6-th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, MA, pp. 315–319 (2000)
Biba, M., Esposito, F., Ferilli, S., Mauro, N.D., Basile, T.M.A.: Unsupervised discretization using kernel density estimation. In: Proc. of the 20-th Int. Conf. on AI, Hyderabad, India, pp. 696–701 (2007)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks, Monterey CA (1984)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) Machine Learning - EWSL-91. LNCS (LNAI), vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Chan, C.C., Batur, C., Srinivasan, A.: Determination of quantization intervals in rule based model for dynamic systems. In: Proc. of the IEEE Conference on Systems, Man, and Cybernetics, Charlottesville, VA, pp. 1719–1723. IEEE Computer Society Press, Los Alamitos (1991)
Chan, C.C., Grzymala-Busse, J.W.: On the attribute redundancy and the learning programs ID3, PRISM, and LEM2. Department of Computer Science, University of Kansas, TR-91-14, December 1991, p. 20 (1991)
Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. Journal of Approximate Reasoning 15, 319–331 (1996)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc of the 12-th Int. Conf. on Machine Learning, Tahoe City, CA, July 9-12, 1995, pp. 194–202 (1995)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of the 13th Int. Joint Conference on AI, Chambery, France, pp. 1022–1027 (1993)
Grzymala-Busse, J.W.: LERS—A system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht, Boston, London (1992)
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)
Grzymala-Busse, J.W.: Discretization of numerical attributes. In: Klösgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 218–225. Oxford University Press, New York (2002)
Grzymala-Busse, J.W.: MLEM2: A new algorithm for rule induction from imperfect data. In: IPMU 2002. Proc. of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Annecy, France, pp. 243–250 (2002)
Grzymala-Busse, J.W.: A comparison of three strategies to rule induction from data with numerical attributes. In: RSKD 2003. Proc. of the Int. Workshop on Rough Sets in Knowledge Discovery, pp. 132–140 in conjunction with the European Joint Conferences on Theory and Practice of Software, Warsaw (2003)
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3-rd International Conference on Data Mining, Melbourne, FL, pp. 56–63 (2003)
Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of idiscernibility relation and rule induction. In: Transactions on Rough Sets. Lecture Notes in Computer Science Journal Subline, vol. 1, pp. 78–95. Springer, Heidelberg (2004)
Grzymala-Busse, J.W.: Incomplete data and generalization of indiscernibility relation, definability, and approximations. In: Ślęzak, D., Wang, G., Szczuka, M., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3641, pp. 244–253. Springer, Heidelberg (2005)
Grzymala-Busse, J.W., Stefanowski, J.: Discretization of numerical attributes by direct use of the rule induction algorithm LEM2 with interval extension. In: IIS’97. Proc. of the Sixth Symposium on Intelligent Information Systems, Zakopane, Poland, pp. 149–158 (1997)
Grzymala-Busse, J.W., Stefanowski, J.: Three discretization methods for rule induction. Int. Journal of Intelligent Systems 16, 29–38 (2001)
Kerber, R.: ChiMerge: Discretization of numeric attributes. In: Proc. of the 10th National Conf. on AI, San Jose, CA, pp. 123–128 (1992)
Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continous features. In: Proc of the 2-nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 114–119 (1996)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)
Nguyen, H.S., Nguyen, S.H.: Discretization methods for data mining. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery, pp. 451–482. Physica-Verlag, Heidelberg (1998)
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London (1991)
Pensa, R.G, Leschi, C., Besson, J., Boulicaut, J.F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Proc. of the 4-th ACM SIGKDD Workshop on Data Mining in Bioinformatics, pp. 24–30. ACM Press, New York (2004)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA (1993)
Stefanowski, J.: Handling continuous attributes in discovery of strong decision rules. In: Proc. of the 1-st Int. Conference on Rough Sets and Current Trends in Computing, Warsaw, pp. 394–401. Springer, Berlin (1998)
Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grzymala-Busse, J.W. (2007). Mining Numerical Data—A Rough Set Approach. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-73451-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73450-5
Online ISBN: 978-3-540-73451-2
eBook Packages: Computer ScienceComputer Science (R0)