Skip to main content

A Comparison of Two Approaches to Data Mining from Imbalanced Data

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2004)

Abstract

Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction was split: the rule set for the larger class was induced by LEM2, while the rule set for the smaller class was induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach to dealing with imbalanced data sets should be selected individually for a specific data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bairagi, R., Suchindran, C.M.: An estimator of the cutoff point maximizing sum of sensitivity and specificity. Sankhya, Series B, Indian Journal of Statistics 51, 263–269 (1989)

    MathSciNet  Google Scholar 

  2. Grzymala-Busse, J.W.: LERS—a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht, Boston, London (1992)

    Google Scholar 

  3. Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)

    MATH  Google Scholar 

  4. Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, pp. 69–74, July 30–31 (2000)

    Google Scholar 

  5. Grzymala-Busse, J.W., Goodwin, L.K., Zhang, X.: Increasing sensitivity of preterm birth by changing rule strengths. In: Proceedings of the Eigth Workshop on Intelligent Information Systems (IIS 1999), Ustron, Poland, June 14–18, pp. 127–136 (1999)

    Google Scholar 

  6. Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, pp. 10–17 (July 30–31, 2000)

    Google Scholar 

  7. Stefanowski, J.: On rough set based approaches to induction of decision rules. In: Skowron, A., Polkowski, L. (eds.) Rough Sets in Knowledge Discovery, pp. 500–529. Physica Verlag, Heidelberg (1998)

    Google Scholar 

  8. Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-oriented perspectives. International Journal of Intelligent Systems 16, 13–28 (2001)

    Article  MATH  Google Scholar 

  9. Stefanowski, J., Wilk, S.: Evaluating business credit risk by means of approach integrating decision rules and case based learning. International Journal of Intelligent Systems in Accounting, Finance and Management 10, 97–114 (2001)

    Article  Google Scholar 

  10. Wilk, S., Slowinski, R., Michalowski, W., Greco, S.: Supporting triage of children with abdominal pain in the emergency room. European Journal of Operation Research (in press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grzymala-Busse, J.W., Stefanowski, J., Wilk, S. (2004). A Comparison of Two Approaches to Data Mining from Imbalanced Data. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30132-5_103

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30132-5_103

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23318-3

  • Online ISBN: 978-3-540-30132-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics