Skip to main content

Radial-Based Approach to Imbalanced Data Oversampling

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10334))

Included in the following conference series:

Abstract

The difficulty of the many practical decision problem lies in the nature of analyzed data. One of the most important real data characteristic is imbalance among examples from different classes. Despite more than two decades of research, imbalanced data classification is still one of the vital challenges to be addressed. The traditional classification algorithms display strongly biased performance on imbalanced datasets. One of the most popular way to deal with such a problem is to modify the learning set to decrease disproportion between objects from different classes using over- or undersampling approaches. In this work a novel preprocessing technique for imbalanced datasets is presented, which takes into consideration the mutual density class distribution. The proposed approach has been evaluated on the basis of the computer experiments carried out on the benchmark datasets. Their results seem to confirm the usefulness of the proposed concept in comparison to the state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, F., Samorani, M., Bellinger, C., Zaïane, O.R.: Advantage of integration in big data: Feature generation in multi-relational databases for imbalanced learning. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, 5–8 December 2016, pp. 532–539 (2016)

    Google Scholar 

  2. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)

    Google Scholar 

  3. Bellinger, C., Sharma, S., Japkowicz, N.: One-class versus binary classification: Which and when? In: 11th International Conference on Machine Learning and Applications, ICMLA, Boca Raton, FL, USA, 12–15 December 2012, vol. 2. pp. 102–106 (2012)

    Google Scholar 

  4. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)

    Google Scholar 

  5. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  7. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 107–119. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  8. Domingos, P.M.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999, pp. 155–164 (1999)

    Google Scholar 

  9. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). doi:10.1007/11538059_91

    Chapter  Google Scholar 

  10. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  11. Porwik, P., Doroz, R., Orczyk, T.: Signatures verification based on PNN classifier optimised by PSO algorithm. Pattern Recogn. 60, 998–1014 (2016)

    Article  Google Scholar 

  12. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)

    Article  Google Scholar 

  13. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6(11), 769–772 (1976)

    MATH  MathSciNet  Google Scholar 

  14. Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H., Herrera, F.: Evolutionary undersampling for extremely imbalanced big data classification under apache spark. In: IEEE Congress on Evolutionary Computation, CEC 2016, Vancouver, BC, Canada, 24–29 July 2016, pp. 640–647 (2016)

    Google Scholar 

  15. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  16. Wozniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Polish National Science Center under the grant no. UMO-2015/19/B/ST6/01597 as well as the PLGrid Infrastructure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michał Koziarski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Koziarski, M., Krawczyk, B., Woźniak, M. (2017). Radial-Based Approach to Imbalanced Data Oversampling. In: Martínez de Pisón, F., Urraca, R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2017. Lecture Notes in Computer Science(), vol 10334. Springer, Cham. https://doi.org/10.1007/978-3-319-59650-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59650-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59649-5

  • Online ISBN: 978-3-319-59650-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics