Skip to main content

New Data Level Approach for Imbalanced Data Classification Improvement

  • Conference paper
  • First Online:
Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

Abstract

The article concerns the problem of imbalanced data classification. The algorithm improving a standard SMOTE method has been proposed and tested. It is a synergy of the existing approaches and was designed to be more versatile than other similar solutions. To measure the distance between objects, the Euclidean or the HVDM metrics were applied, depending on the number of nominal attributes in a data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man, Cybern. Part C: Appl. Rev. 42(4), 463–484 (2012)

    Article  Google Scholar 

  2. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  3. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl 6(1), 7–19 (2004)

    Article  Google Scholar 

  4. Sun, Y., Kamela, M.S., Wongb, A.K.C., Wangc, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit. 40(12), 3358–3378 (2007)

    Article  Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  6. Borowska, K., Topczewska, M.: Data preprocessing in the classification of the imbalanced data. Adv. Comput. Sci. Res. 11, 31–46 (2014)

    Google Scholar 

  7. Taeho, J., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. Newsl. 6(1), 40–49 (2004)

    Article  Google Scholar 

  8. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Int. Res. 1, 1–34 (1997)

    MathSciNet  MATH  Google Scholar 

  9. Han H., Wang W.Y., Mao B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of the 2005 International Conference on Advances in Intelligent Computing—vol. Part I, pp. 878–887 (2005)

    Google Scholar 

  10. Hu S., Liang Y., Ma L., He Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Proceedings of the 2009 Second International Workshop on Computer Science and Engineering—vol. 02, pp. 13–17 (2009)

    Google Scholar 

  11. Napierała, Krystyna, Stefanowski, Jerzy, Wilk, Szymon: Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. In: Proc. of the 7th InternationalConference on Rough Sets and Current Trends in Computing, pp. 158–167 (2010)

    Google Scholar 

  12. Barua, S., Islam, M., Murase, K.: A novel synthetic minority oversampling technique for imbalanced data set learning. In: Proceedings of the 18th InternationalConference on Neural Information Processing—vol. Part II, pp. 735–744 (2011)

    Google Scholar 

  13. UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/ (2004). Accessed 20 May 2014

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Magdalena Topczewska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Borowska, K., Topczewska, M. (2016). New Data Level Approach for Imbalanced Data Classification Improvement. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26227-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26225-3

  • Online ISBN: 978-3-319-26227-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics