Generalized class-specific kernelized extreme learning machine for multiclass imbalanced learning

doi:10.1016/j.eswa.2018.12.024

Expert Systems with Applications

Volume 121, 1 May 2019, Pages 244-255

https://doi.org/10.1016/j.eswa.2018.12.024 Get rights and content

Highlights

•
Handles the multiclass imbalanced problems.
•
Generalized class-specific kernelized extreme learning machine.
•
The training time is significantly lower than kernelized weighted extreme learning machine.
•
Benchmark results confirm the effectiveness of the proposed classifier.

Abstract

Class imbalanced learning is a well-known issue, which exists in real-world applications. Datasets that have skewed class distribution raise hindrance to the traditional learning algorithms. Traditional classifiers give the same importance to all the samples, which leads to the prediction biased towards the majority classes. To solve this intrinsic deficiency, numerous strategies have been proposed such as weighted extreme learning machine (WELM), weighted support vector machine (WSVM), class-specific extreme learning machine (CS-ELM) and class-specific kernelized extreme learning machine (CSKELM). This work focuses on multiclass imbalance problems, which are more difficult compared to the binary class imbalance problems. Kernelized extreme learning machine (KELM) yields better results compared to the traditional extreme learning machine (ELM), which uses random input parameters. This work presents a generalized CSKELM (GCSKELM), the extension of our recently proposed CSKELM, which addresses the multiclass imbalanced problems more effectively. The proposed GCSKELM can be applied directly to solve the multiclass imbalanced problems. GCSKELM with Gaussian kernel function avoids the non-optimal hidden node problem associated with CS-ELM and other existing variants of ELM. The proposed work also has less computational cost in contrast with kernelized WELM (KWELM) for multiclass imbalanced learning. This work employs class-specific regularization parameters, which are determined by employing class proportion. The extensive experimental analysis shows that the proposed work obtains promising generalization performance in contrast with the other state-of-the-art imbalanced learning methods.

Introduction

Over the past decades, the class imbalanced problems have drawn increasing consideration from the data mining community and many machine learning algorithms have been presented (Haixiang, Yijing, Shang, Mingyun, Yuanyue, Bing, 2017, He, Garcia, 2009, Sarmanova, Albayrak, 2013). Almost all the classification problems in real-world have a distinction in the class proportion. The classes whose number of samples are below the average number of samples per class are termed as the minority classes. The classes whose number of samples are above the average number of samples per class are termed as the majority classes. Some examples of the class imbalance problems are fraud detection (Wei, Li, Cao, Ou, Chen, 2013, Zakaryazad, Duman, 2016), software defect prediction (Wang & Yao, 2013), cancer malignancy grading (Krawczyk, Galar, Jele, & Herrera, 2016) etc. The problem associated with the class imbalance learning is that the conventional classifiers usually misclassify most of the minority class samples as the majority class samples. For example, in cancer malignancy grading, most patients are normal i.e. patients having cancer are rare. It is important to effectively recognize the cancer patients.

During the last decades, most of the existing imbalanced learning methods are designed for binary class scenarios (He, Garcia, 2009, Lim, Goh, Tan, 2017). Multiclass imbalanced problems are more difficult in contrast with the binary imbalanced problems (Wang & Yao, 2012). It has been shown in Wang and Yao (2012) and Abdi and Hashemi (2016) that there are many unresolved issues in multiclass imbalanced problems. It has been stated in Sáez, Krawczyk, and Woźniak (2016) that the considerable unequal distribution of the samples belonging to different classes is not only the origin of difficulties for the machine learning methods. In addition, it also associates the challenge sets, which are determined by the structure of the dataset. For example, a small sample size, small disjuncts (the minority class can consist of several sub-concepts) and class overlapping as illustrated in Fig. 1 raise the difficulty of multiclass imbalance problems. It is desirable to design a more effective and efficient algorithm to deal with the multiclass imbalanced problems.

ELM proposed by Huang, Zhu, and Siew (2006) has drawn increasing attention among the researchers around the world. It is a single-hidden layer feedforward neural networks (SLFNs) with random weights between the input and the hidden layer. It computes the weights between the hidden layer and the output layer analytically by employing the Moore-Penrose (MP) pseudoinverse. This makes ELM much faster compared to the standard neural networks, which require great effort in hyper-parameter tuning. It has been stated in Janakiraman, Nguyen, Sterniak, and Assanis (2015) and Janakiraman, Nguyen, and Assanis (2016) that the standard ELM does not take into consideration the class imbalance problem effectively. Several modification of ELM such as WELM (Zong, Huang, & Chen, 2013), Boosting WELM (Li, Kong, Lu, Wenyin, & Yin, 2014), Regularized Weighted Circular Complex valued ELM (Shukla & Yadav, 2015), CCR-ELM (Xiao, Zhang, Li, Zhang, & Yang, 2017), class imbalance learning using UnderBagging based KELM (Raghuwanshi & Shukla, 2018a), class-specific cost-sensitive boosting WELM (Raghuwanshi & Shukla, 2018b), CS-ELM (Raghuwanshi & Shukla, 2018c), CSKELM (Raghuwanshi & Shukla, 2018d) and UnderBagging reduced KWELM (Raghuwanshi & Shukla, 2018e) have been designed to address the class imbalance problem effectively.

As mentioned above, the random weights are employed to transform the input data into the feature space. These weights remain unchanged over the training phase. Due to this a number of the samples usually at the decision boundary are misclassified in certain realizations. It has been stated in Iosifidis and Gabbouj (2015) and Iosifidis, Tefas, and Pitas (2015) that KELM has good generalization compared to the standard ELM for the small and medium range datasets. KELM is comparatively slower on the bigger datasets in contrast with the standard ELM. For most of the datasets (small and medium range datasets), KELM runs faster compared to the standard ELM (Iosifidis, Gabbouj, 2015, Iosifidis, Tefas, Pitas, 2015).

WELM (Zong et al., 2013) minimizes the weighted cumulative error with respect to each sample. WELM uses two weighting schemes to assign class-wise weights to the samples. These weighting schemes assign more weight to increase the impact of the minority class while diminishing the relative impact of the majority class. It has been stated in He and Ma (2013) that, the Lagrangian multiplier, α relating to the respective minority class samples need to be higher in weight in contrast with the α of the majority class samples. So, this work strengthens the regularization parameter of the minority class sample in contrast with the majority class samples to provide additional attention to the minority class samples.

This work proposes GCSKELM the extension of CSKELM for the multiclass imbalance problem, which uses the Gaussian kernel function to transform the input data into the kernel space. As mentioned above, the proposed work utilizes class-specific regularization parameter whose value is computed by using class proportion. GCSKELM differs from CS-ELM as it does not utilize random weights to transform the input data into the feature space. Similar to ELM (Huang, Zhou, Ding, & Zhang, 2012) and WELM (Zong et al., 2013), GCSKELM can be directly applied to the multiclass imbalanced problems with good generalization performance.

The rest of the paper is organized as follows: The Section 2 elaborates the related work in detail. The Section 3 explains the proposed work. The Section 4 presents the experimental setup and the result analysis. The last section concludes the paper along with future directions.

Section snippets

Extreme learning machine

ELM is a single-hidden layer feedforward neural networks proposed by Huang et al. (2012, 2006). As mentioned above, ELM is much faster compared to the standard neural networks, which require a great effort in hyper-parameter tuning. ELM randomly assigns the input weights and the hidden layer biases. It is easy to find the output weights as ELM uses linear activation function at the output layer. Given N training samples ${(x_{i}, t_{i})}_{i = 1}^{N}$ . Here, the input vector $x_{i} = [x_{i 1}, x_{i 2}, \dots, x_{i n}]^{T}$  ∈  Rⁿ and its

Generalized class-specific kernelized extreme learning machine

In this paper, the GCSKELM is proposed, which is the extension of CSKELM. GCSKELM is designed to address the multiclass imbalanced problems more effectively. This work does not require the allotment of weight to each training sample. This work utilizes the class specific regularization parameter whose value is determined using the class proportion and the value of the regularization parameter. Thus, the optimization problem of GCSKELM can be formulated as follows: $\begin{matrix} M i n i m i z e : \frac{1}{2} {∥ β ∥}^{2} + \frac{1}{2} \sum_{k = 1}^{m} \frac{(N - N_{k}) \times C}{N} \end{matrix}$

Dataset description

The experiments were performed to evaluate the proposed classifier by using 22 datasets, which were obtained from online repositories, including UCI Machine Learning Repository (Dheeru & Karra Taniskidou, 2017) and the KEEL data repository (Alcalá et al., 2011). These datasets are available in the 5-fold cross validation format. The imbalance ratio (IR) of the datasets is different for different datasets. The imbalance ratio (IR) is calculated as follows: $\begin{matrix} M u l t i c l a s s : I R = \frac{m i n (# t_{k})}{m a x (# t_{k})}, k = 1, 2, 3, \dots m \end{matrix}$

Conclusion

Multiclass imbalance problems have been recognized in many practical domains. However, most of the machine learning algorithms tend to be overwhelmed and ignore the minority class since they are originally developed to address the balance classification problems. This work presents an extension of CSKELM to address the multiclass imbalanced problems. The proposed work utilizes class-specific regularization parameters, which can be determined by utilizing the class proportion and the value of

Authorship contribution

S. Shukla and B.S. Raghuwanshi: Conceptualization

B.S. Raghuwanshi: Data collection from standard dataset repositories

B.S. Raghuwanshi: Experimentation

B.S. Raghuwanshi, S. Shukla: Result analysis

B.S. Raghuwanshi: Supervision

References (43)

A.P. Bradley
The use of the area under the roc curve in the evaluation of machine learning algorithms
Pattern Recognition
(1997)
S. Ding et al.
Kernel based online learning for imbalance multiclass classification
Neurocomputing
(2018)
F. Fernández-Navarro et al.
A dynamic over-sampling procedure based on sensitivity for multi-class problems
Pattern Recognition
(2011)
C. Ferri et al.
An experimental comparison of performance measures for classification
Pattern Recognition Letters
(2009)
G. Haixiang et al.
Learning from class-imbalanced data: Review of methods and applications
Expert Systems with Applications
(2017)
G.-B. Huang et al.
Extreme learning machine: Theory and applications
Neurocomputing
(2006)
A. Iosifidis et al.
On the kernel extreme learning machine speedup
Pattern Recognition Letters
(2015)
A. Iosifidis et al.
On the kernel extreme learning machine classifier
Pattern Recognition Letters
(2015)
V.M. Janakiraman et al.
Stochastic gradient based extreme learning machines for stable online learning of advanced combustion engines
Neurocomputing
(2016)
B. Krawczyk et al.
Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
Applied Soft Computing
(2016)

K. Li et al.

Boosting weighted {ELM} for imbalanced learning

Neurocomputing

(2014)

B. Mirza et al.

Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification

Neural Networks

(2016)

L. Nanni et al.

Coupling different methods for overcoming the class imbalance problem

Neurocomputing

(2015)

B.S. Raghuwanshi et al.

Class-specific extreme learning machine for handling binary class imbalance problem

Neural Networks

(2018)

B.S. Raghuwanshi et al.

Class-specific kernelized extreme learning machine for binary class imbalance learning

Applied Soft Computing

(2018)

B.S. Raghuwanshi et al.

Underbagging based reduced kernelized weighted extreme learning machine for class imbalance learning

Engineering Applications of Artificial Intelligence

(2018)

J.A. Sáez et al.

Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

Pattern Recognition

(2016)

W. Xiao et al.

Class-specific cost regulation extreme learning machine for imbalanced classification

Neurocomputing

(2017)

A. Zakaryazad et al.

A profit-driven artificial neural network (ann) with applications to fraud detection and direct marketing

Neurocomputing

(2016)

W. Zong et al.

Weighted extreme learning machine for imbalance learning

Neurocomputing

(2013)

L. Abdi et al.

To combat multi-class imbalanced problems by means of over-sampling techniques

IEEE Transactions on Knowledge and Data Engineering

(2016)

Cited by (25)

Supervised contrastive representation learning with tree-structured parzen estimator Bayesian optimization for imbalanced tabular data
2024, Expert Systems with Applications
Imbalanced tabular datasets adversely impact the predictive performance of most supervised learning algorithms as the imbalanced distribution can lead to a bias preferring the majority class. To address this problem, we propose utilizing supervised contrastive representation learning in conjunction with the tree-structured parzen estimator technique for imbalanced tabular data. Drawing on the success of contrastive representation learning in computer vision, we extend its application to the tabular domain. Through the introduction of supervised contrastive learning, we address the limitation of data augmentation methods for tabular data by incorporating label information. This approach enables us to extract hidden information from the tabular data and obtain discriminative representations, which enhances the performance of supervised learning algorithms. Additionally, the hyper-parameter temperature $τ$ of supervised contrastive learning has a decisive influence on the performance and is difficult to tune. We introduce tree-structured parzen estimator, a Bayesian optimization technique, to automatically select the best $τ$ . We evaluate our approach on fifteen real-world public tabular datasets from diverse domains. The results reveal the superiority of tree-structured parzen estimator over other hyper-parameter optimization methods in effectively searching for the optimal value of $τ$ . More importantly, the proposed method outperforms baseline approaches for imbalanced learning, achieving average improvements of 5.1%, 6.0%, 9.0%, and 8.7% across four main evaluation metrics, which validates that the proposed method is well-suited for addressing imbalanced problems in real-world applications.
Self-paced learning-assisted regularization reconstruction method with data-adaptive prior for electrical capacitance tomography
2022, Expert Systems with Applications
Citation Excerpt :
The pre-trained ML model can offer a fast prediction after the training is completed. At present, ML technology has gained rapid development and brought revolutionary changes to many traditional problems (Raza et al., 2019; Bani-Hani & Khasawneh, 2019; Liu, 2019; Ramos-Perez, Alonso-Gonzalez, & Nunez-Velazquez, 2019; Alirezaei, Niaki, & Niaki, 2019; Raghuwanshi & Shukla, 2019). Many ML methods can be employed to achieve regression tasks, for example, support vector machine method, neural network, random forest method, deep learning, etc.
The advent of electrical capacitance tomography provides a potential visualization-based measurement paradigm for process monitoring, but the large gap between the reconstructed images and the true images poses a great challenge to the application of the technology. To settle the formidable challenge, this study proposes a new two-stage self-paced learning-based reconstruction (SPLR) method. In the first stage, the random forest method is introduced to provide a prediction of the reference image. The reference image prior is data-adaptive and does not require any prior assumptions, which is different from common handcrafted priors. In the second stage, the self-paced learning (SPL) is introduced to build a new optimization problem equipped with the sparse prior and the reference image for formulating the inversion-based reconstruction problem (IBRP), which is solved by a new optimizer that only requires the gradient information, without inversion computation. The new SPLR algorithm achieves not only the fusion of the domain knowledge and the dada-adaptive knowledge learned from the given dataset but also the confluence of the supervised learning and the solution of the IBRP. Quantitative validation results on challenging cases illustrating the performance gain of the SPLR algorithm over the prevalent imaging techniques in terms of reconstruction quality and robustness are displayed.
Minimum variance-embedded kernelized extension of extreme learning machine for imbalance learning
2021, Pattern Recognition
Citation Excerpt :
Standard ELM [3] itself is not explicitly developed to overcome class imbalance, in which there is a much larger number of instances belonging to one class compared to another class. Some variants of ELM, such as WELM [4], VW-ELM [5], CCR-ELM (CCR-ELM) [6], class-specific kernelized ELM (CSKELM) [7], generalized CSKELM (GCSKELM) [8] and minimum class variance class-specific ELM (MCVCSELM) [9] have been designed to address the imbalanced learning effectively. Kernelized ELM (KELM) uses the most popular Gaussian kernel function [10].
In classification problems, detecting a skew class has extensively been studied in the machine learning community. Traditional extreme learning machine (ELM) algorithm becomes biased towards the majority class due to imbalance learning. To handle this problem, several extensions of ELM have been proposed such as variances-constrained weighted ELM (VW-ELM) and class-specific kernelized ELM (CSKELM). Kernelized ELM (KELM) has a better generalization capability than traditional ELM. This work proposes novel minimum variance embedded-kernelized weighted extreme learning machine (MVKWELM) and minimum variance-embedded class-specific kernelized extreme learning machine (MVCSKELM) methods for handling the imbalanced classification problems more effectively. These methods constitute novel extensions of the VW-ELM and CSKELM classifiers respectively. This minimum variance-embedding enhances the generalization capability of the algorithm by minimizing the intra-class variance. MVCSKELM uses the advantages of both the minimum variance-embedding framework and the class-specific regularization parameters. The proposed MVCSKELM also has comparable computational complexity compared to kernelized weighted ELM (KWELM). The proposed MVCSKELM adopted class-specific regularization parameters, which are determined by using class distribution. The proposed works are evaluated using benchmark real-world imbalanced datasets downloaded from the KEEL dataset repository. The experimental results demonstrate that MVKWELM and MVCSKELM achieve superior performance in contrast to KELM, KWELM, CCR-KELM, CSKELM, RUSBoost, WKSMOTE, VW-ELM, and EasyEnsemble for imbalance learning.
Minimum class variance class-specific extreme learning machine for imbalanced classification
2021, Expert Systems with Applications
Citation Excerpt :
Standard ELM (Janakiraman, Nguyen, Sterniak, & Assanis, 2015; Janakiraman, Nguyen, & Assanis, 2016) has been developed to work on balanced classification problems, it does not take into consideration the class imbalance problem. Some variants of ELM, such as WELM (Zong et al., 2013), VW-ELM(Liu, Jin, & Mu, 2020), Boosting WELM (BWELM) (Li, Kong, Lu, Wenyin, & Yin, 2014), CCR-ELM (CCR-ELM) (Xiao, Zhang, Li, Zhang, & Yang, 2017), CSELM (Raghuwanshi & Shukla, 2018a), class-specific kernelized ELM (CSKELM) (Raghuwanshi & Shukla, 2018b), SMOTE-CSELM (Raghuwanshi & Shukla, 2020), generalized CSKELM (GCSKELM) (Raghuwanshi & Shukla, 2019c), UnderBagging based KELM (UBKELM) (Raghuwanshi & Shukla, 2019a), UnderBagging based reduced KWELM (UBRKWELM) (Raghuwanshi & Shukla, 2018c), online sequential CS-ELM (Shukla & Raghuwanshi, 2019) and class-specific cost-sensitive boosting WELM (Raghuwanshi & Shukla, 2019b) have been designed to address the imbalanced learning effectively. The main contributions of this paper are highlighted below.
Imbalanced problems occur in real-world applications when the number of majority instances far exceeds the number of minority instances. Traditional extreme learning machine (ELM) classifier becomes biased towards the majority class due to imbalanced learning. To handle this inherent drawback, several modifications of ELM have been proposed such as weighted ELM (WELM), variances-constrained WELM (VW-ELM) to tackle the class imbalance problem effectively. One of our recent works class-specific ELM (CSELM) employs class-specific regularization and has been shown to outperform WELM for imbalanced learning. Motivated by CSELM, this work proposes a minimum class variance class-specific extreme learning machine (MCVCSELM), a variant of CSELM for tackling binary class imbalance problems more effectively. MCVCSELM uses the advantages of both the minimum class variance and the class-specific regularization. The proposed work also has lower computational complexity compared to WELM and VW-ELM. In class-specific cost regulation ELM (CCR-ELM), the calculation of the regularization parameters does not consider class distribution and class overlap. However, the performance of the CCR-ELM is comparable to ELM. MCVCSELM utilizes a class-specific regularization parameter whose value is decided by using the class proportion. The experimental results on 38 binary class datasets with different imbalanced ratios demonstrate that the proposed algorithm outperforms several state-of-the-art methods for imbalanced learning.
A semi-supervised linear–nonlinear least-square learning network for prediction of carbon efficiency in iron ore sintering process
2020, Control Engineering Practice
Citation Excerpt :
In the past years, researchers from various fields have made substantial contributes to the ELM applications (Du, Liu, & Wang, 2014; Kang, Zhao, Qian, & Muhammad Afzal, 2017; Shao, Ge, Song, & Wang, 2019). An ELM has been applied for many specific application problems, such as an ELM for online sequential data (Tang, Xiao, & Mao, 2019), an ELM for noisy/missing data (Reddy, Bhattacharya, & Rishita, 2018), an ELM for imbalanced data (Raghuwanshi & Shukla, 2018), and so on. To copy with the problem of lack of labeled samples, Huang, Song, Gupta, and Wu (2014) proposed a semi-supervised ELM and Shao et al. (2019) proposed a semi-supervised probability mixture ELM.
An iron ore sintering is a large energy-consuming process. The energy mainly comes from the combustion of carbon. Improving the carbon efficiency is beneficial to cost saving and environmental protection. The carbon efficiency has to be predicted before it can be improved. A semi-supervised linear–nonlinear least-square learning network (LLLN) was devised based on the process characteristics for the prediction of the carbon efficiency. First, a new comprehensive carbon ratio (CCR) that takes into account the coke residual was proposed for estimating the carbon efficiency. Then, the process characteristics that are concerned in building the model were presented. They are the existence of linear–nonlinear component and limited labeled samples. After that, a semi-supervised LLLN (SS-LLLN) approach that takes into account the process characteristics was presented for the prediction of the CCR. Last, actual run data was collected to verify the effectiveness of the proposed method. The error distribution, accuracy, and overfitness of an extreme learning machine (ELM), a semi-supervised ELM, an LLLN and an SS-LLLN were compared, which shows the effectiveness of the SS-LLLN.
An approach for predicting digital material consumption in electronic warfare
2020, Defence Technology
Citation Excerpt :
On the one hand, although we have proved that RBF is appropriate to this study, how to use other kernel functions, such as polynomial function, to further carry out more comparative analysis, is an important work in the future. On the other hand, recent learning machine methods, especially extreme learning machine (ELM) algorithm [39,40], deserve our attention in the follow-up study. ELM has become a popular research topic due to its several unique characteristics: ease of use, fast learning speed, good generalization performance, suitable for almost all nonlinear activation functions and suitable for fully complex activation functions.
Electronic warfare is a modern combat mode, in which predicting digital material consumption is a key for material requirements planning (MRP). In this paper, we introduce an insensitive loss function (ε) and propose a ε-SVR-based prediction approach. First, we quantify values of influencing factors of digital equipments in electronic warfare and a small-sample data on real consumption to form a real combat data set, and preprocess it to construct the sample space. Subsequently, we establish the ε-SVR-based prediction model based on “wartime influencing factors - material consumption” and perform model training. In case study, we give 8 historical battle events with battle damage data and predict 3 representative kinds of digital materials by using the proposed approach. The results illustrate its higher accuracy and more convenience compared with other current approaches. Taking data acquisition controller prediction as an example, our model has better prediction performance (RMSE = 0.575 7, MAPE (%) = 12.037 6 and R² = 0.996 0) compared with BP neural network model (RMSE = 1.272 9, MAPE (%) = 23.577 5 and R² = 0.980 3) and GM (1, 1) model (RMSE = 2.095 0, MAPE (%) = 24.188 0 and R² = 0.946 6). The fact shows that the approach can be used to support decision-making for MRP in electronic warfare.

View all citing articles on Scopus

View full text

Generalized class-specific kernelized extreme learning machine for multiclass imbalanced learning

Highlights

Abstract

Introduction

Section snippets

Extreme learning machine

Generalized class-specific kernelized extreme learning machine

Dataset description

Conclusion

Authorship contribution

Pattern Recognition

Neurocomputing

Pattern Recognition

Pattern Recognition Letters

Expert Systems with Applications

Neurocomputing

Pattern Recognition Letters

Pattern Recognition Letters

Neurocomputing

Applied Soft Computing

Neurocomputing

Neural Networks

Neurocomputing

Neural Networks

Applied Soft Computing

Engineering Applications of Artificial Intelligence

Pattern Recognition

Neurocomputing

Neurocomputing

Neurocomputing

To combat multi-class imbalanced problems by means of over-sampling techniques

IEEE Transactions on Knowledge and Data Engineering