Generalized class-specific kernelized extreme learning machine for multiclass imbalanced learning

https://doi.org/10.1016/j.eswa.2018.12.024Get rights and content

Highlights

  • Handles the multiclass imbalanced problems.

  • Generalized class-specific kernelized extreme learning machine.

  • The training time is significantly lower than kernelized weighted extreme learning machine.

  • Benchmark results confirm the effectiveness of the proposed classifier.

Abstract

Class imbalanced learning is a well-known issue, which exists in real-world applications. Datasets that have skewed class distribution raise hindrance to the traditional learning algorithms. Traditional classifiers give the same importance to all the samples, which leads to the prediction biased towards the majority classes. To solve this intrinsic deficiency, numerous strategies have been proposed such as weighted extreme learning machine (WELM), weighted support vector machine (WSVM), class-specific extreme learning machine (CS-ELM) and class-specific kernelized extreme learning machine (CSKELM). This work focuses on multiclass imbalance problems, which are more difficult compared to the binary class imbalance problems. Kernelized extreme learning machine (KELM) yields better results compared to the traditional extreme learning machine (ELM), which uses random input parameters. This work presents a generalized CSKELM (GCSKELM), the extension of our recently proposed CSKELM, which addresses the multiclass imbalanced problems more effectively. The proposed GCSKELM can be applied directly to solve the multiclass imbalanced problems. GCSKELM with Gaussian kernel function avoids the non-optimal hidden node problem associated with CS-ELM and other existing variants of ELM. The proposed work also has less computational cost in contrast with kernelized WELM (KWELM) for multiclass imbalanced learning. This work employs class-specific regularization parameters, which are determined by employing class proportion. The extensive experimental analysis shows that the proposed work obtains promising generalization performance in contrast with the other state-of-the-art imbalanced learning methods.

Introduction

Over the past decades, the class imbalanced problems have drawn increasing consideration from the data mining community and many machine learning algorithms have been presented (Haixiang, Yijing, Shang, Mingyun, Yuanyue, Bing, 2017, He, Garcia, 2009, Sarmanova, Albayrak, 2013). Almost all the classification problems in real-world have a distinction in the class proportion. The classes whose number of samples are below the average number of samples per class are termed as the minority classes. The classes whose number of samples are above the average number of samples per class are termed as the majority classes. Some examples of the class imbalance problems are fraud detection (Wei, Li, Cao, Ou, Chen, 2013, Zakaryazad, Duman, 2016), software defect prediction (Wang & Yao, 2013), cancer malignancy grading (Krawczyk, Galar, Jele, & Herrera, 2016) etc. The problem associated with the class imbalance learning is that the conventional classifiers usually misclassify most of the minority class samples as the majority class samples. For example, in cancer malignancy grading, most patients are normal i.e. patients having cancer are rare. It is important to effectively recognize the cancer patients.

During the last decades, most of the existing imbalanced learning methods are designed for binary class scenarios (He, Garcia, 2009, Lim, Goh, Tan, 2017). Multiclass imbalanced problems are more difficult in contrast with the binary imbalanced problems (Wang & Yao, 2012). It has been shown in Wang and Yao (2012) and Abdi and Hashemi (2016) that there are many unresolved issues in multiclass imbalanced problems. It has been stated in Sáez, Krawczyk, and Woźniak (2016) that the considerable unequal distribution of the samples belonging to different classes is not only the origin of difficulties for the machine learning methods. In addition, it also associates the challenge sets, which are determined by the structure of the dataset. For example, a small sample size, small disjuncts (the minority class can consist of several sub-concepts) and class overlapping as illustrated in Fig. 1 raise the difficulty of multiclass imbalance problems. It is desirable to design a more effective and efficient algorithm to deal with the multiclass imbalanced problems.

ELM proposed by Huang, Zhu, and Siew (2006) has drawn increasing attention among the researchers around the world. It is a single-hidden layer feedforward neural networks (SLFNs) with random weights between the input and the hidden layer. It computes the weights between the hidden layer and the output layer analytically by employing the Moore-Penrose (MP) pseudoinverse. This makes ELM much faster compared to the standard neural networks, which require great effort in hyper-parameter tuning. It has been stated in Janakiraman, Nguyen, Sterniak, and Assanis (2015) and Janakiraman, Nguyen, and Assanis (2016) that the standard ELM does not take into consideration the class imbalance problem effectively. Several modification of ELM such as WELM (Zong, Huang, & Chen, 2013), Boosting WELM (Li, Kong, Lu, Wenyin, & Yin, 2014), Regularized Weighted Circular Complex valued ELM (Shukla & Yadav, 2015), CCR-ELM (Xiao, Zhang, Li, Zhang, & Yang, 2017), class imbalance learning using UnderBagging based KELM (Raghuwanshi & Shukla, 2018a), class-specific cost-sensitive boosting WELM (Raghuwanshi & Shukla, 2018b), CS-ELM (Raghuwanshi & Shukla, 2018c), CSKELM (Raghuwanshi & Shukla, 2018d) and UnderBagging reduced KWELM (Raghuwanshi & Shukla, 2018e) have been designed to address the class imbalance problem effectively.

As mentioned above, the random weights are employed to transform the input data into the feature space. These weights remain unchanged over the training phase. Due to this a number of the samples usually at the decision boundary are misclassified in certain realizations. It has been stated in Iosifidis and Gabbouj (2015) and Iosifidis, Tefas, and Pitas (2015) that KELM has good generalization compared to the standard ELM for the small and medium range datasets. KELM is comparatively slower on the bigger datasets in contrast with the standard ELM. For most of the datasets (small and medium range datasets), KELM runs faster compared to the standard ELM (Iosifidis, Gabbouj, 2015, Iosifidis, Tefas, Pitas, 2015).

WELM (Zong et al., 2013) minimizes the weighted cumulative error with respect to each sample. WELM uses two weighting schemes to assign class-wise weights to the samples. These weighting schemes assign more weight to increase the impact of the minority class while diminishing the relative impact of the majority class. It has been stated in He and Ma (2013) that, the Lagrangian multiplier, α relating to the respective minority class samples need to be higher in weight in contrast with the α of the majority class samples. So, this work strengthens the regularization parameter of the minority class sample in contrast with the majority class samples to provide additional attention to the minority class samples.

This work proposes GCSKELM the extension of CSKELM for the multiclass imbalance problem, which uses the Gaussian kernel function to transform the input data into the kernel space. As mentioned above, the proposed work utilizes class-specific regularization parameter whose value is computed by using class proportion. GCSKELM differs from CS-ELM as it does not utilize random weights to transform the input data into the feature space. Similar to ELM (Huang, Zhou, Ding, & Zhang, 2012) and WELM (Zong et al., 2013), GCSKELM can be directly applied to the multiclass imbalanced problems with good generalization performance.

The rest of the paper is organized as follows: The Section 2 elaborates the related work in detail. The Section 3 explains the proposed work. The Section 4 presents the experimental setup and the result analysis. The last section concludes the paper along with future directions.

Section snippets

Extreme learning machine

ELM is a single-hidden layer feedforward neural networks proposed by Huang et al. (2012, 2006). As mentioned above, ELM is much faster compared to the standard neural networks, which require a great effort in hyper-parameter tuning. ELM randomly assigns the input weights and the hidden layer biases. It is easy to find the output weights as ELM uses linear activation function at the output layer. Given N training samples {(xi,ti)}i=1N. Here, the input vector xi=[xi1,xi2,,xin]T  ∈  Rn and its

Generalized class-specific kernelized extreme learning machine

In this paper, the GCSKELM is proposed, which is the extension of CSKELM. GCSKELM is designed to address the multiclass imbalanced problems more effectively. This work does not require the allotment of weight to each training sample. This work utilizes the class specific regularization parameter whose value is determined using the class proportion and the value of the regularization parameter. Thus, the optimization problem of GCSKELM can be formulated as follows:Minimize:12β2+12k=1m(NNk)×CN

Dataset description

The experiments were performed to evaluate the proposed classifier by using 22 datasets, which were obtained from online repositories, including UCI Machine Learning Repository (Dheeru & Karra Taniskidou, 2017) and the KEEL data repository (Alcalá et al., 2011). These datasets are available in the 5-fold cross validation format. The imbalance ratio (IR) of the datasets is different for different datasets. The imbalance ratio (IR) is calculated as follows:Multiclass:IR=min(#tk)max(#tk),k=1,2,3,m

Conclusion

Multiclass imbalance problems have been recognized in many practical domains. However, most of the machine learning algorithms tend to be overwhelmed and ignore the minority class since they are originally developed to address the balance classification problems. This work presents an extension of CSKELM to address the multiclass imbalanced problems. The proposed work utilizes class-specific regularization parameters, which can be determined by utilizing the class proportion and the value of

Authorship contribution

S. Shukla and B.S. Raghuwanshi: Conceptualization

B.S. Raghuwanshi: Data collection from standard dataset repositories

B.S. Raghuwanshi: Experimentation

B.S. Raghuwanshi, S. Shukla: Result analysis

B.S. Raghuwanshi: Supervision

References (43)

  • K. Li et al.

    Boosting weighted {ELM} for imbalanced learning

    Neurocomputing

    (2014)
  • B. Mirza et al.

    Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification

    Neural Networks

    (2016)
  • L. Nanni et al.

    Coupling different methods for overcoming the class imbalance problem

    Neurocomputing

    (2015)
  • B.S. Raghuwanshi et al.

    Class-specific extreme learning machine for handling binary class imbalance problem

    Neural Networks

    (2018)
  • B.S. Raghuwanshi et al.

    Class-specific kernelized extreme learning machine for binary class imbalance learning

    Applied Soft Computing

    (2018)
  • B.S. Raghuwanshi et al.

    Underbagging based reduced kernelized weighted extreme learning machine for class imbalance learning

    Engineering Applications of Artificial Intelligence

    (2018)
  • J.A. Sáez et al.

    Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

    Pattern Recognition

    (2016)
  • W. Xiao et al.

    Class-specific cost regulation extreme learning machine for imbalanced classification

    Neurocomputing

    (2017)
  • A. Zakaryazad et al.

    A profit-driven artificial neural network (ann) with applications to fraud detection and direct marketing

    Neurocomputing

    (2016)
  • W. Zong et al.

    Weighted extreme learning machine for imbalance learning

    Neurocomputing

    (2013)
  • L. Abdi et al.

    To combat multi-class imbalanced problems by means of over-sampling techniques

    IEEE Transactions on Knowledge and Data Engineering

    (2016)
  • Cited by (25)

    • Self-paced learning-assisted regularization reconstruction method with data-adaptive prior for electrical capacitance tomography

      2022, Expert Systems with Applications
      Citation Excerpt :

      The pre-trained ML model can offer a fast prediction after the training is completed. At present, ML technology has gained rapid development and brought revolutionary changes to many traditional problems (Raza et al., 2019; Bani-Hani & Khasawneh, 2019; Liu, 2019; Ramos-Perez, Alonso-Gonzalez, & Nunez-Velazquez, 2019; Alirezaei, Niaki, & Niaki, 2019; Raghuwanshi & Shukla, 2019). Many ML methods can be employed to achieve regression tasks, for example, support vector machine method, neural network, random forest method, deep learning, etc.

    • Minimum variance-embedded kernelized extension of extreme learning machine for imbalance learning

      2021, Pattern Recognition
      Citation Excerpt :

      Standard ELM [3] itself is not explicitly developed to overcome class imbalance, in which there is a much larger number of instances belonging to one class compared to another class. Some variants of ELM, such as WELM [4], VW-ELM [5], CCR-ELM (CCR-ELM) [6], class-specific kernelized ELM (CSKELM) [7], generalized CSKELM (GCSKELM) [8] and minimum class variance class-specific ELM (MCVCSELM) [9] have been designed to address the imbalanced learning effectively. Kernelized ELM (KELM) uses the most popular Gaussian kernel function [10].

    • Minimum class variance class-specific extreme learning machine for imbalanced classification

      2021, Expert Systems with Applications
      Citation Excerpt :

      Standard ELM (Janakiraman, Nguyen, Sterniak, & Assanis, 2015; Janakiraman, Nguyen, & Assanis, 2016) has been developed to work on balanced classification problems, it does not take into consideration the class imbalance problem. Some variants of ELM, such as WELM (Zong et al., 2013), VW-ELM(Liu, Jin, & Mu, 2020), Boosting WELM (BWELM) (Li, Kong, Lu, Wenyin, & Yin, 2014), CCR-ELM (CCR-ELM) (Xiao, Zhang, Li, Zhang, & Yang, 2017), CSELM (Raghuwanshi & Shukla, 2018a), class-specific kernelized ELM (CSKELM) (Raghuwanshi & Shukla, 2018b), SMOTE-CSELM (Raghuwanshi & Shukla, 2020), generalized CSKELM (GCSKELM) (Raghuwanshi & Shukla, 2019c), UnderBagging based KELM (UBKELM) (Raghuwanshi & Shukla, 2019a), UnderBagging based reduced KWELM (UBRKWELM) (Raghuwanshi & Shukla, 2018c), online sequential CS-ELM (Shukla & Raghuwanshi, 2019) and class-specific cost-sensitive boosting WELM (Raghuwanshi & Shukla, 2019b) have been designed to address the imbalanced learning effectively. The main contributions of this paper are highlighted below.

    • A semi-supervised linear–nonlinear least-square learning network for prediction of carbon efficiency in iron ore sintering process

      2020, Control Engineering Practice
      Citation Excerpt :

      In the past years, researchers from various fields have made substantial contributes to the ELM applications (Du, Liu, & Wang, 2014; Kang, Zhao, Qian, & Muhammad Afzal, 2017; Shao, Ge, Song, & Wang, 2019). An ELM has been applied for many specific application problems, such as an ELM for online sequential data (Tang, Xiao, & Mao, 2019), an ELM for noisy/missing data (Reddy, Bhattacharya, & Rishita, 2018), an ELM for imbalanced data (Raghuwanshi & Shukla, 2018), and so on. To copy with the problem of lack of labeled samples, Huang, Song, Gupta, and Wu (2014) proposed a semi-supervised ELM and Shao et al. (2019) proposed a semi-supervised probability mixture ELM.

    • An approach for predicting digital material consumption in electronic warfare

      2020, Defence Technology
      Citation Excerpt :

      On the one hand, although we have proved that RBF is appropriate to this study, how to use other kernel functions, such as polynomial function, to further carry out more comparative analysis, is an important work in the future. On the other hand, recent learning machine methods, especially extreme learning machine (ELM) algorithm [39,40], deserve our attention in the follow-up study. ELM has become a popular research topic due to its several unique characteristics: ease of use, fast learning speed, good generalization performance, suitable for almost all nonlinear activation functions and suitable for fully complex activation functions.

    View all citing articles on Scopus
    View full text