Abstract
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction was split: the rule set for the larger class was induced by LEM2, while the rule set for the smaller class was induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach to dealing with imbalanced data sets should be selected individually for a specific data set.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bairagi, R., Suchindran, C.M.: An estimator of the cutoff point maximizing sum of sensitivity and specificity. Sankhya, Series B, Indian Journal of Statistics 51, 263–269 (1989)
Grzymala-Busse, J.W.: LERS—a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht, Boston, London (1992)
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)
Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, pp. 69–74, July 30–31 (2000)
Grzymala-Busse, J.W., Goodwin, L.K., Zhang, X.: Increasing sensitivity of preterm birth by changing rule strengths. In: Proceedings of the Eigth Workshop on Intelligent Information Systems (IIS 1999), Ustron, Poland, June 14–18, pp. 127–136 (1999)
Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, pp. 10–17 (July 30–31, 2000)
Stefanowski, J.: On rough set based approaches to induction of decision rules. In: Skowron, A., Polkowski, L. (eds.) Rough Sets in Knowledge Discovery, pp. 500–529. Physica Verlag, Heidelberg (1998)
Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-oriented perspectives. International Journal of Intelligent Systems 16, 13–28 (2001)
Stefanowski, J., Wilk, S.: Evaluating business credit risk by means of approach integrating decision rules and case based learning. International Journal of Intelligent Systems in Accounting, Finance and Management 10, 97–114 (2001)
Wilk, S., Slowinski, R., Michalowski, W., Greco, S.: Supporting triage of children with abdominal pain in the emergency room. European Journal of Operation Research (in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grzymala-Busse, J.W., Stefanowski, J., Wilk, S. (2004). A Comparison of Two Approaches to Data Mining from Imbalanced Data. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30132-5_103
Download citation
DOI: https://doi.org/10.1007/978-3-540-30132-5_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23318-3
Online ISBN: 978-3-540-30132-5
eBook Packages: Springer Book Archive