Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering

Sáez, José A.; Luengo, Julián; Stefanowski, Jerzy; Herrera, Francisco

doi:10.1007/978-3-319-10840-7_8

José A. Sáez¹⁸,
Julián Luengo¹⁹,
Jerzy Stefanowski²⁰ &
…
Francisco Herrera¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8669))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1733 Accesses

Abstract

Imbalance data constitutes a great difficulty for most algorithms learning classifiers. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data as the presence of noisy and borderline examples in the areas surrounding class boundaries.

This contribution proposes to extend SMOTE with a noise filter called Iterative-Partitioning Filter (IPF), which can overcome these problems. The properties of this proposal are discussed in a controlled experimental study against SMOTE and its most well-known generalizations. The results show that the new proposal performs better than exiting SMOTE generalizations for all these different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Evidence-based adaptive oversampling algorithm for imbalanced classification

Article 23 September 2023

A Novel Approach to Solve Class Imbalance Problem Using Noise Filter Method

MK-SMOTE and M-SMOTE: enhanced techniques for handling class imbalance problem

Article 24 February 2025

References

Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
Bhowan, U., Johnston, M., Zhang, M.: Developing new fitness functions in genetic programming for classification with unbalanced data. IEEE T. Syst. Man Cy. B 42(2), 406–421 (2012)
Article Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
MATH Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Gamberger, D., Lavrac, N., Dzeroski, S.: Noise Detection and Elimination in Data Preprocessing: experiments in medical domains. Appl. Artif. Intell. 14, 205–223 (2000)
Article Google Scholar
Gamberger, D., Boskovic, R., Lavrac, N., Groselj, C.: Experiments With Noise Filtering in a Medical Domain. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 143–151. Morgan Kaufmann Publishers (1999)
Google Scholar
García, V., Alejo, R., Sánchez, J.S., Sotoca, J.M., Mollineda, R.A.: Combined effects of class imbalance and class overlap on instance-based classification. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 371–378. Springer, Heidelberg (2006)
Chapter Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. IEEE T. Knowl. Data En. 21(9), 1263–1284 (2009)
Article Google Scholar
Kermanidis, K.L.: The effect of borderline examples on language learning. J. Exp. Theor. Artif. In. 21, 19–42 (2009)
Article MATH Google Scholar
Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22, 387–396 (2007)
Article Google Scholar
Kubat, M., Matwin, S.: Addresing the curse of imbalanced training sets: one-side selection. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 179–186 (1997)
Google Scholar
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
Chapter Google Scholar
Sáez, J.A., Luengo, J., Herrera, F.: Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recogn. 46(1), 355–364 (2013)
Article Google Scholar
Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna, S., Howlett, R.J. (eds.) Emerging Paradigms in ML and Applications. SIST, vol. 13, pp. 277–306. Springer, Heidelberg (2013)
Chapter Google Scholar
Verbaeten, S., Van Assche, A.: Ensemble methods for noise elimination in classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, CITIC-UGR, 18071, Granada, Spain
José A. Sáez & Francisco Herrera
Department of Civil Engineering, LSI, University of Burgos, 09006, Burgos, Spain
Julián Luengo
Institute of Computing Science, Poznań, University of Technology, ul. Piotrowo 2, 60-965, Poznań, Poland
Jerzy Stefanowski

Authors

José A. Sáez
View author publications
You can also search for this author in PubMed Google Scholar
Julián Luengo
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Stefanowski
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado & Héctor Quintián &
University of the Basque Country, Pasco Manuel de Lardizábal 1, 20018, San Sebastián, Spain
José A. Lozano
The University of Manchester, Sackville Street, M13 9PL, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F. (2014). Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2014. IDEAL 2014. Lecture Notes in Computer Science, vol 8669. Springer, Cham. https://doi.org/10.1007/978-3-319-10840-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-10840-7_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10839-1
Online ISBN: 978-3-319-10840-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics