An overlap sensitive neural network for class imbalanced data

Shahee, Shaukat Ali; Ananthakumar, Usha

doi:10.1007/s10618-021-00766-4

An overlap sensitive neural network for class imbalanced data

Published: 18 May 2021

Volume 35, pages 1654–1687, (2021)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Shaukat Ali Shahee¹ &
Usha Ananthakumar¹

859 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

Class imbalance is one of the well-known challenges in machine learning. Class imbalance occurs when one class dominates the other class in terms of the number of observations. Due to this imbalance, conventional classifiers fail to classify the minority class correctly. The challenges become even more severe when class overlap occurs in imbalanced data. Though literature is available to sequentially deal with class imbalance and class overlap, these methods are quite complex and not so efficient. In this paper, we propose an overlap-sensitive artificial neural network that can handle the problem of class overlapping and class imbalance simultaneously, along with noisy and outlier observations. The strength of this method lies in identifying the overlapping observations rather than the region and in not using multiple classifiers unlike the other existing methods. The key idea of the proposed method is in weighing the observations based on its location in the feature space before training the neural network. The performance of the proposed method is evaluated on 12 simulated data sets and 23 real-life data sets and compared with other well known methods.The results clearly indicate the strength and ability of the proposed method for a wide variety of imbalance ratio and levels of overlapping. Also, it is shown that the proposed method is statistically superior to the other methods in terms of different performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Improvement of Artificial Neural Networks by Addressing Class Overlapping Problem

An Experimental Study of the Joint Effects of Class Imbalance and Class Overlap

Overlap-Based Undersampling for Improving Imbalanced Data Classification

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Notes

http://www.kdd.org/kdd-cup.

References

Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 17:255–287
Google Scholar
Alibeigi M, Hashemi S, Hamzeh A (2012) DBFS: an effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl Eng 81:67–103
Article Google Scholar
Alshomrani S, Bawakid A, Shim S-O, Fernández A, Herrera F (2015) A proposal for evolutionary fuzzy systems using feature weighting: dealing with overlapping in imbalanced datasets. Knowl-Based Syst 73:1–17
Article Google Scholar
Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Article Google Scholar
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Article Google Scholar
Batista GE, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. In: International symposium on intelligent data analysis. Springer, Berlin, pp 24–35
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Article Google Scholar
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636
Article Google Scholar
Ceci M, Pio G, Kuzmanovski V, Džeroski S (2015) Semi-supervised multi-view learning for gene network reconstruction. PLoS ONE 10(12):e0144031
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6(1):1–6
Article Google Scholar
Cleofas-Sánchez L, García V, Marqués A, Sánchez JS (2016) Financial distress prediction using the hybrid associative memory with translation. Appl Soft Comput 44:144–152
Article Google Scholar
Cui Y, Jia M, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9268–9277
Das B, Krishnan NC, Cook DJ (2013) Handling class overlap and imbalance to detect prompt situations in smart homes. In: 2013 IEEE 13th international conference on data mining workshops. IEEE, pp 266–273
Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence. vol 17. Lawrence Erlbaum Associates Ltd, pp 973–978
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36
Article MathSciNet Google Scholar
Guo H, Viktor HL (2004a) Boosting with data generation: improving the classification of hard to learn examples. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer Berlin, pp 1082–1091
Guo H, Viktor HL (2004b) Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. ACM SIGKDD Explor Newsl 6(1):30–39
Article Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, pp 878–887
He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, 2008. IJCNN 2008. IEEE world congress on computational intelligence. IEEE, pp 1322–1328
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Article Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent data analysis 6(5):429–449
Article Google Scholar
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explor Newsl 6(1):40–49
Article Google Scholar
Lee HK, Kim SB (2018) An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst Appl 98:72–83
Article Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
McClelland JL, Rumelhart DE, Hinton GE (1988) The appeal of parallel distributed processing. Morgan Kaufmann, Burlington
Book Google Scholar
Piras L, Giacinto G (2012) Synthetic pattern generation for imbalanced learning in image retrieval. Pattern Recognit Lett 33(16):2198–2205
Article Google Scholar
Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Mexican international conference on artificial intelligence. Springer, Berlin, pp 312–321
Provost FJ, Fawcett T et al (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: KDD-97 Proceedings, vol. 97. American Association for Artificial Intelligence, pp 43–48
Qu Y, Su H, Guo L, Chu J (2011) A novel SVM modeling approach for highly imbalanced and overlapping classification. Intell Data Anal 15(3):319–341
Article Google Scholar
Richardson A (2010) Nonparametric statistics for non-statisticians: a step-by-step approach by Gregory W. Corder, Dale I. Foreman. Int Stat Rev 78(3):451–452
Article Google Scholar
Shahee SA, Ananthakumar U (2018a) An adaptive oversampling technique for imbalanced datasets. In: Industrial conference on data mining. Springer, Berlin, pp 1–16
Shahee SA, Ananthakumar U (2018b) Synthetic sampling approach based on model-based clustering for imbalanced data. Int J Artif Intell Soft Comput 6(4):348–364
Article Google Scholar
Shahee SA, Ananthakumar U (2019) An effective distance based feature selection approach for imbalanced data. Appl Intell 5:1–29
Google Scholar
Simard PY, Steinkraus D, Platt JC et al (2003) Best practices for convolutional neural networks applied to visual document analysis. In: Icdar. vol 3
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Article Google Scholar
Tang Y, Gao J (2007) Improved classification for problem involving overlapping patterns. IEICE Trans Inf Syst 90(11):1787–1795
Article Google Scholar
Tang W, Mao K, Mak LO, Ng GW (2010) Classification for overlapping classes using optimized overlapping region detection and soft decision. In: 2010 13th international conference on information fusion. IEEE, pp 1–8
Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54(1):45–66
Article Google Scholar
Thanathamathee P, Lursinsap C (2013) Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and adaboost techniques. Pattern Recogn Lett 34(12):1339–1347
Article Google Scholar
Tharwat A (2018) Classification assessment methods. Appl Comput Inform 17(1):168–192
Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 3:659–665
Article Google Scholar
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybernet 6:769–772
MathSciNet MATH Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 3:408–421
Article MathSciNet Google Scholar
Xiong H, Wu J, Liu L (2010) Classification with classoverlapping: a systematic study. In: Proceedings of the 1st international conference on E-business intelligence (ICEBI2010). pp Atlantis Press
Yin L, Ge Y, Xiao K, Wang X, Quan X (2013) Feature selection for high-dimensional imbalanced data. Neurocomputing 105:3–11
Article Google Scholar
Zhou L (2013) Performance of corporate bankruptcy prediction models on imbalanced dataset: the effect of sampling methods. Knowl-Based Syst 41:16–25
Article Google Scholar
Zikeba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101
Article Google Scholar

Download references

Author information

Authors and Affiliations

SJM School of Management, Indian Institute of Technology Bombay, Bombay, 400076, India
Shaukat Ali Shahee & Usha Ananthakumar

Authors

Shaukat Ali Shahee
View author publications
You can also search for this author inPubMed Google Scholar
Usha Ananthakumar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Usha Ananthakumar.

Additional information

Responsible editor: Pierre Baldi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahee, S.A., Ananthakumar, U. An overlap sensitive neural network for class imbalanced data. Data Min Knowl Disc 35, 1654–1687 (2021). https://doi.org/10.1007/s10618-021-00766-4

Download citation

Received: 09 October 2019
Accepted: 06 May 2021
Published: 18 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10618-021-00766-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An overlap sensitive neural network for class imbalanced data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Improvement of Artificial Neural Networks by Addressing Class Overlapping Problem

An Experimental Study of the Joint Effects of Class Imbalance and Class Overlap

Overlap-Based Undersampling for Improving Imbalanced Data Classification

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now