Elsevier

Neurocomputing

Volume 403, 25 August 2020, Pages 360-370
Neurocomputing

Boosting label weighted extreme learning machine for classifying multi-label imbalanced data

https://doi.org/10.1016/j.neucom.2020.04.098Get rights and content

Highlights

  • LW-ELM algorithm is integrated into the Boosting ensemble learning framework.

  • BLW algorithm is designed to address multi-label imbalance classification problem.

  • BLW adaptively tunes label weights without exploring the prior distribution directly.

  • Experimental results indicate that the BLW-ELM algorithm is robust and time-saving.

Abstract

As a flexible and efficient cost sensitive learning algorithm, the label weighted extreme learning machine (LW-ELM) has been proposed to address the class imbalance learning problem on the multi-label data. However, due to the adoption of empirical costs, the classification performance of LW-ELM can't be guaranteed enough. To solve this problem, an improved algorithm called BLW-ELM, which integrates LW-ELM into the Boosting ensemble learning framework, is presented in this paper. Specifically, BLW-ELM designates the appropriate cost for each training label belonging to each training instance according to the iterative feedbacks of the training results, further gets rid of exploring the intricate distribution of multi-label data directly. That is to say, BLW-ELM is an universal and self-adapting algorithm that can promotes the robustness of classification regardless of the data distribution types. Twelve multi-label data sets are used to verify the effectiveness and superiority of the proposed algorithm. Experimental results indicate that the proposed BLW-ELM algorithm is significantly superior to LW-ELM algorithm and many other state-of-the-art multi-label imbalance learning algorithms, as well it generally needs far less training time than those sophisticated algorithms.

Introduction

In supervised learning, the single-label learning in where each instance only associates with one unique label is the most studied paradigm. In real world applications, however, an object may be simultaneously related with multiple different labels, for examples, an image includes mountain, lake, cascade, tree, cloud, sky and sun labels simultaneously (see Fig. 1), a news report covers several different topics, such as economy, political and sport, and a protein holds some different biological functions synchronously, etc. We call this type of data as the multi-label data and the modeling procedure of multi-label data as the multi-label learning. In the past decade, the multi-label learning has gradually developed to be one of the research hotspots in the field of machine learning [1].

Most existing multi-label learning studies focus on how to improve the recognition rate by either modifying the single-label learning model to make it adapt the multi-label data [2], [3], or mining the correlations among labels to promote the quality of the model [4], [5], [6], but in general, ignoring the issue of class imbalance. In fact, multi-label learning faces a greater threat from class imbalance than the single-label learning, as for each label in multi-label data, the instances might have seriously skewed distribution [7]. We note that as another research hotspot, there have existed a plentiful of class imbalance learning methods, including sampling [8], [9], [10], [11], [12], cost-sensitive learning [13], [14], [15], [16], threshold strategies [17], [18], [19], [20], one-class learning [21], [22], metric learning [23], [24] and ensemble learning [25], [26], [27], [28], [29]. However, most of them are designed to only address the single-label classification problem, and it is difficult to directly transform them to deal with multi-label imbalanced data.

In recent several years, some researchers have noted the multi-label imbalance classification problem, further proposed several effective solutions [7], [30], [31], [32], [33], [34], [35], [36]. Adopting these techniques, the impact of class imbalance distribution can be alleviated more or less, however, each one has its inherent drawbacks, e.g., low robustness caused by the empirical or random manipulation, the high time-complexity because of the adoption of complex calculation, etc.

Label weighted extreme learning machine (LW-ELM) [34] is an efficient class imbalance learning algorithm which can be used to classify the multi-label data. As a cost-sensitive learning algorithm, LW-ELM not only inherits the robustness and the fast training speed from extreme learning machine (ELM), but also presents a higher flexibility than the other cost-sensitive extreme learning machine algorithm called WELM [37]. Although the LW-ELM algorithm holds several remarkable merits, we note that its modeling quality is not excellent enough as in general, the label costs are designated empirically. Considering the empirical costs only associate with the class imbalance ratio, but neglect the specific data distribution, the quality of the model constructed by LW-ELM can be further improved.

In this paper, we benefit from the idea of Boosting WELM algorithm [38], which integrates WELM model into the Boosting ensemble learning framework, and further propose a novel algorithm named BLW-ELM algorithm. BLW-ELM first empirically designates the initial label costs based on the class imbalance ratios, and then adjusts the cost of each label belonging to each instance according to the feedback of the current model, and finally organizes all single trained models to make decision in the form of weighted voting. The advantage of the BLW-ELM algorithm lies in that it can significantly improve the generalization ability of LW-ELM, and meanwhile get rid of exploring the prior distribution of the multi-label data.

The remainder of this paper is organized as follows. In Section 2, we provide a basic description about the multi-label class imbalance problem. Section 3 presents some related work of multi-label class imbalance learning. Section 4 describes the proposed BLW-ELM algorithm in detail. Then, in Section 5, the experimental results and the corresponding discussions are presented. Finally, Section 6 concludes the contributions of this paper and indicates the future work.

Section snippets

What is class imbalance in multi-label data?

As mentioned in Section 1, the multi-label data indicates that an instance associates with multiple labels. Suppose there is a multi-label data set D={x1,x2,…,x|D|} and the corresponding label space O={1,2,…,|O|}, where |D| denotes the number of instances in the data set, |O| indicates the number of labels, and xi denotes a specific instance which associates with a label set YiO. In theory, there exist 2|O|-1 different label sets with considering each instance associates a label in O at least.

Extreme learning machine

ELM, which was proposed by Huang et al., [40], [41], [42], is a specific learning algorithm for single-hidden layer feedforward neural network (SLFN) (see Fig. 3). The main characteristics of ELM that distinguishes from those conventional learning algorithms of SLFN is the random generation of hidden nodes. Therefore, ELM does not need to iteratively regulate parameters to make them approach the optimal values, causing it has faster learning speed and better generalization ability. Previous

The description about the data sets

In this paper, we collected 12 multi-label data sets from the MLC Toolbox [39] to validate the effectiveness of the proposed BLW-ELM algorithm. Specifically, these data sets have different number of instances, number of features, number of labels, label cardinalities and label densities. The data sets also cover several different fields, including image, text and biology, etc. The detailed information about these data sets are provided in Table 1.

Experimental settings

To validate the effectiveness and superiority of

Concluding remarks

In this paper, we tried to improve the LW-ELM algorithm by integrating it into the Boosting learning framework, further designed a novel algorithm named BLW-ELM to address multi-label class imbalance learning problem. The merit of the BLW-ELM algorithm lies in that it gets rid of exploring the complex data distribution directly, but adaptively assigns the appropriate label weights. Lots of experimental results indicated that the proposed BLW-ELM algorithm is an robust, efficient and universal

Acknowledgements

This work was supported by Natural Science Foundation of Jiangsu Province of China under grant No. BK20191457, Open Project of Artificial Intelligence Key Laboratory of Sichuan Province under grant No. 2019RYJ02, National Natural Science Foundation of China under grants No. 61305058 and No. 61572242, China Postdoctoral Science Foundation under grants No.2013 M540404 and No. 2015 T80481.

Ke Cheng was born in Chizhou, Anhui, China, in 1972. He received the B.S. degree in mining engineering from Hunan University of Science and Technology, Xiangtan, China, in 1996, and received M.S. degree in agricultural process engineering from Jiangsu University, Zhenjiang, China, in 1999, and Ph.D. degree in computer science from Nanjing University of Science and Technology, Nanjing, China, in 2006. Since 2008, he has been an Associate Professor in School of Computer, Jiangsu University of

References (46)

  • H. Yu et al.

    Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data

    Knowl.-Based Syst.

    (2015)
  • X. Ben et al.

    A general tensor representation framework for cross-view gait recognition

    Pattern Recogn.

    (2019)
  • Z. Sun et al.

    A novel ensemble method for classifying imbalanced data

    Pattern Recogn.

    (2015)
  • F. Charte et al.

    MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation

    Knowl.-Based Syst.

    (2015)
  • M.A. Tahir et al.

    Inverse random under sampling for class imbalance problem and its application to multi-label classification

    Pattern Recogn.

    (2012)
  • W. Zong et al.

    Weighted extreme learning machine for imbalance learning

    Neurocomputing

    (2013)
  • K. Li et al.

    Boosting weighted ELM for imbalanced learning

    Neurocomputing

    (2014)
  • G.B. Huang et al.

    Extreme learning machine: theory and applications

    Neurocomputing

    (2006)
  • G. Huang et al.

    Trends in extreme learning machines: A review

    Neural Networks

    (2015)
  • S. Garcia et al.

    Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power

    Inf. Sci.

    (2010)
  • M.L. Zhang et al.

    A review on multi-label learning algorithms

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • S. Kanj et al.

    Editing training data for multi-label classification with the k-nearest neighbor rule

    Pattern Anal. Appl.

    (2016)
  • S.M. Tabatabaei et al.

    Toward non-intrusive load monitoring via multi-label classification

    IEEE Trans. Smart Grid

    (2016)
  • Cited by (24)

    • Improved hybrid resampling and ensemble model for imbalance learning and credit evaluation

      2022, Journal of Management Science and Engineering
      Citation Excerpt :

      Threshold-based methods generally proceed by setting the threshold values of different classes in different classifier learning stages, whereas single-class learning methods are performed by training the classifier using a training set that contains only a specific class (Ganganwar, 2012). Kernel and activation function-based transformation methods (Ding et al., 2018), fuzzy-based methods, and clustering and task decomposition strategies have been commonly used and combined with support vector machines (SVM) (Richhariya and Tanveer, 2020), decision trees (DTs), neural networks, K-nearest neighbor learning, extreme learning machines (Cheng et al., 2020; Liu et al., 2020), and rule-based classifiers (Guo et al., 2017). Other recent imbalanced learning-based approaches include the SMOTE-based class-specific extreme learning machine (Choudhary and Shukla, 2021), and Gaussian affinity for max-margin class imbalanced learning (Hayat et al., 2019), and AdaBalGAN (Wang et al., 2019).

    • Online ensemble learning algorithm for imbalanced data stream

      2021, Applied Soft Computing
      Citation Excerpt :

      Ensemble learning algorithms [1–6] changes the distribution of training data sets with a certain way to obtain multiple training subsets, and then constructs multiple base-classifiers to predict unknown data by voting with multiple base-classifiers. Boosting algorithm [7–11] and bagging algorithm [12–15] are the two most representative ensemble learning algorithms. But these two and their improved ensemble learning algorithms are designed for block learning and are not suitable for online data stream learning.

    View all citing articles on Scopus

    Ke Cheng was born in Chizhou, Anhui, China, in 1972. He received the B.S. degree in mining engineering from Hunan University of Science and Technology, Xiangtan, China, in 1996, and received M.S. degree in agricultural process engineering from Jiangsu University, Zhenjiang, China, in 1999, and Ph.D. degree in computer science from Nanjing University of Science and Technology, Nanjing, China, in 2006. Since 2008, he has been an Associate Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. From 2002 to 2003, he was a senior visiting scholar in School of Computer Science, Southeast University, Nanjing, China. He has authored or co-authored more than 20 research papers. His research interests include machine learning, data mining and bioinformatics.

    Shang Gao received his B.S. degree in System Engineering from Air Force Engineering University, Xi'an, China, in 1993, the M.S. degree in Military Equipment from Air-Force Engineering University, Xi'an, China, in 1996, and the Ph.D. degree in Pattern Recognition and Intelligent Systems from Nanjing University of Science and Technology, Nanjing, China, in 2006, respectively.

    Since 2009, he has been an Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. Since 2017, he has been the dean of School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. He has published over 80 research articles on the professional journals and conferences. His research interests include Optimization Theory, Swarm Intelligence and Machine Learning.

    Wenlu Dong was born in Henan, China, in 1995. She received the B.S. degree in Computer Science and Technology from Henan Institute of Engineering, Zhengzhou, in 2018.

    Since 2018, she is working toward the M.S. degree in the School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. Her research interests mainly include machine learning and data mining.

    Xibei Yang received the B.S. degree in computer sciences from Xuzhou Normal University, Xuzhou, China, in 2002, the M.S. degree in computer applications from Jiangsu University of Science and Technology, Zhenjiang, China, in 2006 and the Ph.D. degree in Pattern Recognition and Intelligence System from Nanjing University of Science and Technology, Nanjing, China, in 2010. Since 2018, he has been a Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. He has published over 100 research articles on the professional journals and conferences. His research interests include granular computing and rough set theory. Dr. Yang is the reviewer for over 10 high-quality international journals, and the member in the organizing committee of several international conferences.

    Qi Wang received his M.S. degree in electrical and computer engineering from Sungkyunkwan University, South Korea in 2011, and Ph.D. degree in information and communications technology from University of Trento, Italy in 2015, respectively. He was a visiting scholar at North Carolina State University from 2013 to 2014.

    Dr. Wang is currently a lecturer with school of computer, Jiangsu University of Science and Technology, Zhenjiang, China. His research interests include wireless sensor network, wireless communications in smart grid, architecture reliability of smart grid with renewable energy system, and fast power charging station for electric vehicles.

    Hualong Yu was born in Harbin, China, in 1982. He received the B.S. degree in computer science from Heilongjiang University, Harbin, China, in 2005, and received M.S. and Ph.D. degrees in computer science from Harbin Engineering University, Harbin, China, in 2008 and 2010, respectively. Since 2010, he has been an Associate Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. From 2013 to 2017, he was a Post-Doctoral Fellow in School of Automation, Southeast University, Nanjing, China. From 2017 to 2018, he was a senior visiting scholar in Faculty of Information Technology, Monash University, Melbourne, Australia. He has authored or co-authored more than 70 journal and conference papers, and 4 monographs, including publications on IEEE TNNLS, IEEE TFS, IEEE TCBB, IEEE Access, Information Science, KBS, Neurocomputing, etc. His research interests include machine learning, data mining and bioinformatics. Dr. Yu is an Associate Editor of IEEE Access, and an active reviewer for more than 20 high-quality international journals, including IEEE TNNLS, TCYB, TKDE, TCBB and ACM TKDD etc., and the member in the organizing committee of several international conferences. He is also the member of ACM, China Computer Federation (CCF) and the Youth Committee of the Chinese Association of Automation (CAA).

    View full text