Skip to main content

An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data

  • Conference paper
  • First Online:
Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017 (CORES 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 578))

Included in the following conference series:

Abstract

In this paper we propose a new algorithm called SPIDER3 for selective preprocessing of multi-class imbalanced data sets. While it borrows selected ideas (i.e., combination of relabeling and local resampling) from its predecessor – SPIDER2, it introduces several important extensions. Unlike SPIDER2, it is able to handle directly multi-class problems. Moreover, it considers the relevance of specific decision classes to control the order of their processing. Finally, it uses information about relations between specific classes (modeled with misclassification costs) to better control the extent of changes introduced locally to preprocessed data. We performed a computational experiment on artificial 3-class data sets to evaluate and compare SPIDER3 to SPIDER2 with temporarily aggregated classes and the results confirmed advantages of the new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cs.waikato.ac.nz/ml/weka/.

  2. 2.

    See the on-line appendix available at http://www.cs.put.poznan.pl/swilk/cores2017/spider3-appendix.pdf.

References

  1. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI 2001, vol. 2, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco (2001). http://dl.acm.org/citation.cfm?id=1642194.1642224

  2. Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Know.-Based Syst. 42, 97–110 (2013). http://dx.doi.org/10.1016/j.knosys.2013.01.018

    Article  Google Scholar 

  3. He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms and Applications. Wiley, New York (2013)

    Book  MATH  Google Scholar 

  4. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artifi. Intell. 5(4), 221–232 (2016)

    Article  Google Scholar 

  5. Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, pp. 104–111 (2011)

    Google Scholar 

  6. Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inform. Syst. 46, 563–597 (2016)

    Article  Google Scholar 

  7. Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13529-3_18

    Chapter  Google Scholar 

  8. Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognit. 57, 164–178 (2015)

    Article  Google Scholar 

  9. Wilk, S., Stefanowski, J., Wojciechowski, S., Farion, K.J., Michalowski, W.: Application of preprocessing methods to imbalanced clinical data: an experimental study. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 503–515. Springer, Cham (2016). doi:10.1007/978-3-319-39796-2_41

    Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge support by the Polish National Science Center under Grant No. DEC-2013/11/B/ST6/00963.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Szymon Wilk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Wojciechowski, S., Wilk, S., Stefanowski, J. (2018). An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59162-9_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59161-2

  • Online ISBN: 978-3-319-59162-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics