Classifying imbalanced data using SMOTE based class-specific kernelized ELM

Raghuwanshi, Bhagat Singh; Shukla, Sanyam

doi:10.1007/s13042-020-01232-1

Classifying imbalanced data using SMOTE based class-specific kernelized ELM

Original Article
Published: 03 January 2021

Volume 12, pages 1255–1280, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Bhagat Singh Raghuwanshi¹ &
Sanyam Shukla¹

817 Accesses
17 Citations
Explore all metrics

Abstract

In machine learning, a problem is imbalanced when the class distributions are highly skewed. Imbalanced classification problems occur usually in many application domains and pose a hindrance to the conventional learning algorithms. Several approaches have been proposed to handle the imbalanced learning. For example, Weighted kernel-based SMOTE (WKSMOTE) and SMOTE based class-specific extreme learning machine (SMOTE-CSELM) are recently proposed algorithms that use the minority oversampling to handle imbalanced learning. It has been illustrated in Raghuwanshi and Shukla (Knowl-Based Syst 187(104):814, 2020) that our recently proposed classifier, SMOTE-CSELM outperforms the other state of art classifiers for class imbalance learning. One drawback of SMOTE-CSELM is the performance fluctuation due to the random initialization of weights between the input and the hidden layer. To handle this problem, this work proposes SMOTE based class-specific kernelized extreme learning machine (SMOTE-CSKELM), which uses the Gaussian kernel function to map the input data to the feature space. The proposed work has the advantage of both the minority oversampling and the class-specific regularization coefficients. SMOTE-CSKELM with the Gaussian kernel function also handles the non-optimal hidden node problem associated with the sigmoid node based variants of ELM. To increase the significance of the specific region corresponding to the minority class in the decision boundary, the synthetic minority oversampling technique (SMOTE) is applied to generate synthetic instances for the minority class to balance the training dataset. The proposed work has comparable training time in contrast with the kernelized weighted extreme learning machine (KWELM) for imbalanced learning. The proposed method is determined by employing benchmark real-world imbalanced datasets. The extensive experimental results report that the proposed method outperforms compared to the other state-of-the-art methods for imbalanced learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Article 30 August 2019

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

References

Raghuwanshi BS, Shukla S (2020) Smote based class-specific extreme learning machine for imbalanced learning. Knowl-Based Syst 187(104):814
Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Google Scholar
López V, Fernàndez A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Google Scholar
Das S, Datta S, Chaudhuri BB (2018) Handling data irregularities in classification: foundations, trends, and future challenges. Pattern Recogn 81:674–693
Google Scholar
Parvin H, Minaei-Bidgoli B, Alizadeh H (2011) Detection of cancer patients using an innovative method for learning at imbalanced datasets. In: Yao J, Ramanna S, Wang G, Suraj Z (eds) Rough sets and knowledge technology. Springer, Berlin, pp 376–381
Google Scholar
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2):195–215
Google Scholar
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Google Scholar
Krawczyk B, Galar M, Jele L, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38(C):714–726
Google Scholar
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232
Google Scholar
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357
MATH Google Scholar
Sun J, Shang Z, Li H (2014) Imbalance-oriented SVM methods for financial distress prediction: a comparative study among the new sb-svm-ensemble method and traditional methods. J Oper Res Soc 65(12):1905–1919
Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Lavrač N, Gamberger D, Todorovski L, Blockeel H (eds) Knowledge discovery in databases: PKDD 2003. Springer, Berlin, pp 107–119
Google Scholar
Barua S, Islam MM, Yao X, Murase K (2014) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Google Scholar
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6(1):20–29
Google Scholar
Kovács G (2019) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 83(105):662
Google Scholar
Fernández A, García S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Int Res 61(1):863–905
MathSciNet MATH Google Scholar
Elreedy D, Atiya AF (2019) A comprehensive analysis of synthetic minority oversampling technique (smote) for handling class imbalance. Inf Sci 505:32–64
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining. Springer, Heidelberg, pp 475–482
Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. Springer, Berlin, pp 878–887
Google Scholar
Mathew J, Pang CK, Luo M, Leong WH (2018) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 20:1–12
Google Scholar
Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B (Cybern) 39(1):281–288
Google Scholar
Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158
MathSciNet MATH Google Scholar
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Springer, Heidelberg, pp 39–50
Google Scholar
Zhou Zhi-Hua, Liu Xu-Ying (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Google Scholar
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378
MATH Google Scholar
Zhou ZH, Liu XY (2010) On multi-class cost-sensitive learning. Comput Intell 26(3):232–257
MathSciNet Google Scholar
Zong W, Huang GB, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomput 101:229–242
Google Scholar
Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(05):961–976
Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Google Scholar
Huang GB, Wang D, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybernet 2:107–122
Google Scholar
Huang G, Huang GB, Song S, You K (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48
MATH Google Scholar
Zhu QY, Qin A, Suganthan P, Huang GB (2005) Evolutionary extreme learning machine. Pattern Recogn 38(10):1759–1763
MATH Google Scholar
Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Google Scholar
Janakiraman VM, Nguyen X, Sterniak J, Assanis D (2015) Identification of the dynamic operating envelope of HCCI engines using class imbalance learning. IEEE Trans Neural Netw Learn Syst 26(1):98–112
MathSciNet Google Scholar
Janakiraman VM, Nguyen X, Assanis D (2016) Stochastic gradient based extreme learning machines for stable online learning of advanced combustion engines. Neurocomputing 177:304–316
Google Scholar
Li K, Kong X, Lu Z, Wenyin L, Yin J (2014) Boosting weighted ELM for imbalanced learning. Neurocomputing 128:15–21
Google Scholar
Raghuwanshi BS, Shukla S (2018) Class-specific kernelized extreme learning machine for binary class imbalance learning. Appl Soft Comput 73:1026–1038
MATH Google Scholar
Xiao W, Zhang J, Li Y, Zhang S, Yang W (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomputing 261:70–82
Google Scholar
Raghuwanshi BS, Shukla S (2018) Class-specific extreme learning machine for handling binary class imbalance problem. Neural Netw 105:206–217
MATH Google Scholar
Raghuwanshi BS, Shukla S (2019) Generalized class-specific kernelized extreme learning machine for multiclass imbalanced learning. Expert Syst Appl 121:244–255
Google Scholar
Shukla S, Raghuwanshi BS (2019) Online sequential class-specific extreme learning machine for binary imbalanced learning. Neural Netw 119:235–248
Google Scholar
Raghuwanshi BS, Shukla S (2019) Class imbalance learning using underbagging based kernelized extreme learning machine. Neurocomputing 329:172–187
Google Scholar
Raghuwanshi BS, Shukla S (2018) Underbagging based reduced kernelized weighted extreme learning machine for class imbalance learning. Eng Appl Artif Intell 74:252–270
Google Scholar
Raghuwanshi BS, Shukla S (2019) Classifying imbalanced data using ensemble of reduced kernelized weighted extreme learning machine. Int J Mach Learn Cybernet 10(11):3071–3097
Google Scholar
Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689
MATH Google Scholar
Zhao YP (2016) Parsimonious kernel extreme learning machine in primal via cholesky factorization. Neural Netw 80:95–109
MATH Google Scholar
Iosifidis A, Gabbouj M (2015) On the kernel extreme learning machine speedup. Pattern Recogn Lett 68:205–210
Google Scholar
Iosifidis A, Tefas A, Pitas I (2015) On the kernel extreme learning machine classifier. Pattern Recogn Lett 54:11–17
Google Scholar
He H, Ma Y (2013) Class imbalance learning methods for support vector machines. Wiley, New York, p 216
Google Scholar
Zeng ZQ, Gao J (2009) Improving SVM classification with imbalance data set. In: Leung CS, Lee M, Chan JH (eds) Neural information processing. Springer, Heidelberg, pp 389–398
Google Scholar
Gao M, Hong X, Chen S, Harris CJ (2011) A combined smote and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing 74(17):3456–3466
Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
MATH Google Scholar
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: IEEE symposium on computational intelligence and data mining, pp 389–395
Schölkopf B (2000) The kernel trick for distances. In: Proceedings of the 13th international conference on neural information processing systems, MIT Press, Cambridge, MA, USA, NIPS’00, pp 283–289
Sun Y, Kamel MS, Wang Y (2006) Boosting for learning multiple classes with imbalanced class distribution. In: Sixth international conference on data mining (ICDM’06), pp 592–602. https://doi.org/10.1109/ICDM.2006.29
Oneto L (2018) Model selection and error estimation without the agonizing pain. WIREs Data Min Knowl Discov 8(4):e1252. https://doi.org/10.1002/widm.1252
Article Google Scholar
Oneto L, Ridella S, Anguita D (2019) Local rademacher complexity machine. Neurocomputing 342:24–32
Google Scholar
Hoerl AE, Kennard RW (2000) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1):80–86
MATH Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Alcalá J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2010) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Google Scholar
Fawcett T (2003) Roc graphs: notes and practical considerations for researchers. Tech. rep., HP Labs, Tech. Rep. HPL-2003-4
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Google Scholar
Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186
MATH Google Scholar
Tang K, Wang R, Chen T (2011) Towards maximizing the area under the roc curve for multi-class classification problems. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI’11, pp 483–488
Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197
Google Scholar
Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomput 158(C):48–61
Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80–83
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Gregory W, Foreman D (2009) Nonparametric statistics for non-statisticians. Wiley, Hoboken. https://doi.org/10.1002/9781118165881
Book MATH Google Scholar
Jain AK, Duin RPW, Mao Jianchang (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Google Scholar
Macià N, Bernadó-Mansilla E, Orriols-Puig A, Ho TK (2013) Learner excellence biased by data set selection: a case for data characterisation and artificial data sets. Pattern Recogn 46(3):1054–1066
Google Scholar
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, 462003, India
Bhagat Singh Raghuwanshi & Sanyam Shukla

Authors

Bhagat Singh Raghuwanshi
View author publications
You can also search for this author in PubMed Google Scholar
Sanyam Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanyam Shukla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raghuwanshi, B.S., Shukla, S. Classifying imbalanced data using SMOTE based class-specific kernelized ELM. Int. J. Mach. Learn. & Cyber. 12, 1255–1280 (2021). https://doi.org/10.1007/s13042-020-01232-1

Download citation

Received: 16 September 2019
Accepted: 04 November 2020
Published: 03 January 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s13042-020-01232-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying imbalanced data using SMOTE based class-specific kernelized ELM

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifying imbalanced data using SMOTE based class-specific kernelized ELM

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation