An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data

Wojciechowski, Szymon; Wilk, Szymon; Stefanowski, Jerzy

doi:10.1007/978-3-319-59162-9_25

Szymon Wojciechowski¹⁷,
Szymon Wilk¹⁷ &
Jerzy Stefanowski¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 578))

Included in the following conference series:

International Conference on Computer Recognition Systems

1069 Accesses
5 Citations

Abstract

In this paper we propose a new algorithm called SPIDER3 for selective preprocessing of multi-class imbalanced data sets. While it borrows selected ideas (i.e., combination of relabeling and local resampling) from its predecessor – SPIDER2, it introduces several important extensions. Unlike SPIDER2, it is able to handle directly multi-class problems. Moreover, it considers the relevance of specific decision classes to control the order of their processing. Finally, it uses information about relations between specific classes (modeled with misclassification costs) to better control the extent of changes introduced locally to preprocessed data. We performed a computational experiment on artificial 3-class data sets to evaluate and compare SPIDER3 to SPIDER2 with temporarily aggregated classes and the results confirmed advantages of the new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.cs.waikato.ac.nz/ml/weka/.
2.
See the on-line appendix available at http://www.cs.put.poznan.pl/swilk/cores2017/spider3-appendix.pdf.

References

Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI 2001, vol. 2, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco (2001). http://dl.acm.org/citation.cfm?id=1642194.1642224
Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Know.-Based Syst. 42, 97–110 (2013). http://dx.doi.org/10.1016/j.knosys.2013.01.018
Article Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms and Applications. Wiley, New York (2013)
Book MATH Google Scholar
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artifi. Intell. 5(4), 221–232 (2016)
Article Google Scholar
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, pp. 104–111 (2011)
Google Scholar
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inform. Syst. 46, 563–597 (2016)
Article Google Scholar
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13529-3_18
Chapter Google Scholar
Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognit. 57, 164–178 (2015)
Article Google Scholar
Wilk, S., Stefanowski, J., Wojciechowski, S., Farion, K.J., Michalowski, W.: Application of preprocessing methods to imbalanced clinical data: an experimental study. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 503–515. Springer, Cham (2016). doi:10.1007/978-3-319-39796-2_41
Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge support by the Polish National Science Center under Grant No. DEC-2013/11/B/ST6/00963.

Author information

Authors and Affiliations

Institute of Computing Science, Piotrowo 2, 60-965, Poznan, Poland
Szymon Wojciechowski, Szymon Wilk & Jerzy Stefanowski

Authors

Szymon Wojciechowski
View author publications
You can also search for this author in PubMed Google Scholar
Szymon Wilk
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Stefanowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szymon Wilk .

Editor information

Editors and Affiliations

Department of Systems and Computer Networks, Wrocław University of Technology, Wrocław, Poland
Marek Kurzynski
Department of Systems and Computer Networks, Wrocław University of Technology, Wroclaw, Poland
Michal Wozniak
Department of Systems and Computer Networks, Wrocław University of Technology , Wroclaw, Poland
Robert Burduk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wojciechowski, S., Wilk, S., Stefanowski, J. (2018). An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-59162-9_25
Published: 07 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59161-2
Online ISBN: 978-3-319-59162-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics