Boosting label weighted extreme learning machine for classifying multi-label imbalanced data

doi:10.1016/j.neucom.2020.04.098

Neurocomputing

Volume 403, 25 August 2020, Pages 360-370

https://doi.org/10.1016/j.neucom.2020.04.098 Get rights and content

Highlights

•
LW-ELM algorithm is integrated into the Boosting ensemble learning framework.
•
BLW algorithm is designed to address multi-label imbalance classification problem.
•
BLW adaptively tunes label weights without exploring the prior distribution directly.
•
Experimental results indicate that the BLW-ELM algorithm is robust and time-saving.

Abstract

As a flexible and efficient cost sensitive learning algorithm, the label weighted extreme learning machine (LW-ELM) has been proposed to address the class imbalance learning problem on the multi-label data. However, due to the adoption of empirical costs, the classification performance of LW-ELM can't be guaranteed enough. To solve this problem, an improved algorithm called BLW-ELM, which integrates LW-ELM into the Boosting ensemble learning framework, is presented in this paper. Specifically, BLW-ELM designates the appropriate cost for each training label belonging to each training instance according to the iterative feedbacks of the training results, further gets rid of exploring the intricate distribution of multi-label data directly. That is to say, BLW-ELM is an universal and self-adapting algorithm that can promotes the robustness of classification regardless of the data distribution types. Twelve multi-label data sets are used to verify the effectiveness and superiority of the proposed algorithm. Experimental results indicate that the proposed BLW-ELM algorithm is significantly superior to LW-ELM algorithm and many other state-of-the-art multi-label imbalance learning algorithms, as well it generally needs far less training time than those sophisticated algorithms.

Introduction

In supervised learning, the single-label learning in where each instance only associates with one unique label is the most studied paradigm. In real world applications, however, an object may be simultaneously related with multiple different labels, for examples, an image includes mountain, lake, cascade, tree, cloud, sky and sun labels simultaneously (see Fig. 1), a news report covers several different topics, such as economy, political and sport, and a protein holds some different biological functions synchronously, etc. We call this type of data as the multi-label data and the modeling procedure of multi-label data as the multi-label learning. In the past decade, the multi-label learning has gradually developed to be one of the research hotspots in the field of machine learning [1].

Most existing multi-label learning studies focus on how to improve the recognition rate by either modifying the single-label learning model to make it adapt the multi-label data [2], [3], or mining the correlations among labels to promote the quality of the model [4], [5], [6], but in general, ignoring the issue of class imbalance. In fact, multi-label learning faces a greater threat from class imbalance than the single-label learning, as for each label in multi-label data, the instances might have seriously skewed distribution [7]. We note that as another research hotspot, there have existed a plentiful of class imbalance learning methods, including sampling [8], [9], [10], [11], [12], cost-sensitive learning [13], [14], [15], [16], threshold strategies [17], [18], [19], [20], one-class learning [21], [22], metric learning [23], [24] and ensemble learning [25], [26], [27], [28], [29]. However, most of them are designed to only address the single-label classification problem, and it is difficult to directly transform them to deal with multi-label imbalanced data.

In recent several years, some researchers have noted the multi-label imbalance classification problem, further proposed several effective solutions [7], [30], [31], [32], [33], [34], [35], [36]. Adopting these techniques, the impact of class imbalance distribution can be alleviated more or less, however, each one has its inherent drawbacks, e.g., low robustness caused by the empirical or random manipulation, the high time-complexity because of the adoption of complex calculation, etc.

Label weighted extreme learning machine (LW-ELM) [34] is an efficient class imbalance learning algorithm which can be used to classify the multi-label data. As a cost-sensitive learning algorithm, LW-ELM not only inherits the robustness and the fast training speed from extreme learning machine (ELM), but also presents a higher flexibility than the other cost-sensitive extreme learning machine algorithm called WELM [37]. Although the LW-ELM algorithm holds several remarkable merits, we note that its modeling quality is not excellent enough as in general, the label costs are designated empirically. Considering the empirical costs only associate with the class imbalance ratio, but neglect the specific data distribution, the quality of the model constructed by LW-ELM can be further improved.

In this paper, we benefit from the idea of Boosting WELM algorithm [38], which integrates WELM model into the Boosting ensemble learning framework, and further propose a novel algorithm named BLW-ELM algorithm. BLW-ELM first empirically designates the initial label costs based on the class imbalance ratios, and then adjusts the cost of each label belonging to each instance according to the feedback of the current model, and finally organizes all single trained models to make decision in the form of weighted voting. The advantage of the BLW-ELM algorithm lies in that it can significantly improve the generalization ability of LW-ELM, and meanwhile get rid of exploring the prior distribution of the multi-label data.

The remainder of this paper is organized as follows. In Section 2, we provide a basic description about the multi-label class imbalance problem. Section 3 presents some related work of multi-label class imbalance learning. Section 4 describes the proposed BLW-ELM algorithm in detail. Then, in Section 5, the experimental results and the corresponding discussions are presented. Finally, Section 6 concludes the contributions of this paper and indicates the future work.

Section snippets

What is class imbalance in multi-label data?

As mentioned in Section 1, the multi-label data indicates that an instance associates with multiple labels. Suppose there is a multi-label data set D={x₁,x₂,…,x_|_D_|} and the corresponding label space O={1,2,…,|O|}, where |D| denotes the number of instances in the data set, |O| indicates the number of labels, and x_i denotes a specific instance which associates with a label set $Y_{i} \subseteq O$ . In theory, there exist 2^|O|-1 different label sets with considering each instance associates a label in O at least.

Extreme learning machine

ELM, which was proposed by Huang et al., [40], [41], [42], is a specific learning algorithm for single-hidden layer feedforward neural network (SLFN) (see Fig. 3). The main characteristics of ELM that distinguishes from those conventional learning algorithms of SLFN is the random generation of hidden nodes. Therefore, ELM does not need to iteratively regulate parameters to make them approach the optimal values, causing it has faster learning speed and better generalization ability. Previous

The description about the data sets

In this paper, we collected 12 multi-label data sets from the MLC Toolbox [39] to validate the effectiveness of the proposed BLW-ELM algorithm. Specifically, these data sets have different number of instances, number of features, number of labels, label cardinalities and label densities. The data sets also cover several different fields, including image, text and biology, etc. The detailed information about these data sets are provided in Table 1.

Experimental settings

To validate the effectiveness and superiority of

Concluding remarks

In this paper, we tried to improve the LW-ELM algorithm by integrating it into the Boosting learning framework, further designed a novel algorithm named BLW-ELM to address multi-label class imbalance learning problem. The merit of the BLW-ELM algorithm lies in that it gets rid of exploring the complex data distribution directly, but adaptively assigns the appropriate label weights. Lots of experimental results indicated that the proposed BLW-ELM algorithm is an robust, efficient and universal

Acknowledgements

This work was supported by Natural Science Foundation of Jiangsu Province of China under grant No. BK20191457, Open Project of Artificial Intelligence Key Laboratory of Sichuan Province under grant No. 2019RYJ02, National Natural Science Foundation of China under grants No. 61305058 and No. 61572242, China Postdoctoral Science Foundation under grants No.2013 M540404 and No. 2015 T80481.

References (46)

Y. Papanikolaou et al.
Hierarchical partitioning of the output space in multi-label data
Data Knowl. Eng.
(2018)
Y. Yu et al.
Multi-label classification by exploiting label correlations
Expert Syst. Appl.
(2014)
F. Charte et al.
Addressing imbalance in multi-label classification: Measures and random resampling algorithms
Neurocomputing
(2015)
H. Yu et al.
ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data
Neurocomputing
(2013)
J. Sun et al.
Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates
Inf. Sci.
(2018)
S. Piri et al.
A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets
Decis. Support Syst.
(2018)
V. López et al.
Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
Fuzzy Sets Syst.
(2015)
S. Datta et al.
Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs
Neural Networks
(2015)
G. Collell et al.
A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
Neurocomputing
(2018)
H. Yu et al.
ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data
Knowl.-Based Syst.
(2016)

H. Yu et al.

Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data

Knowl.-Based Syst.

(2015)

X. Ben et al.

A general tensor representation framework for cross-view gait recognition

Pattern Recogn.

(2019)

Z. Sun et al.

A novel ensemble method for classifying imbalanced data

Pattern Recogn.

(2015)

F. Charte et al.

MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation

Knowl.-Based Syst.

(2015)

M.A. Tahir et al.

Inverse random under sampling for class imbalance problem and its application to multi-label classification

Pattern Recogn.

(2012)

W. Zong et al.

Weighted extreme learning machine for imbalance learning

Neurocomputing

(2013)

K. Li et al.

Boosting weighted ELM for imbalanced learning

Neurocomputing

(2014)

G.B. Huang et al.

Extreme learning machine: theory and applications

Neurocomputing

(2006)

G. Huang et al.

Trends in extreme learning machines: A review

Neural Networks

(2015)

S. Garcia et al.

Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power

Inf. Sci.

(2010)

M.L. Zhang et al.

A review on multi-label learning algorithms

IEEE Trans. Knowl. Data Eng.

(2016)

S. Kanj et al.

Editing training data for multi-label classification with the k-nearest neighbor rule

Pattern Anal. Appl.

(2016)

S.M. Tabatabaei et al.

Toward non-intrusive load monitoring via multi-label classification

IEEE Trans. Smart Grid

(2016)

Cited by (24)

Multi-label borderline oversampling technique
2024, Pattern Recognition
Class imbalance problem commonly exists in multi-label classification (MLC) tasks. It has non-negligible impacts on the classifier performance and has drawn extensive attention in recent years. Borderline oversampling has been widely used in single-label learning as a competitive technique in dealing with class imbalance. Nevertheless, the borderline samples in multi-label data sets (MLDs) have not been studied. Hence, this paper deeply discussed the borderline samples in MLDs and found they have different neighboring relationships with class borders, which makes their roles different in the classifier training. For that, they are divided into two types named the self-borderline samples and the cross-borderline samples. Further, a novel MLDs resampling approach called Multi-Label Borderline Oversampling Technique (MLBOTE) is proposed for multi-label imbalanced learning. MLBOTE identifies three types of seed samples, including interior, self-borderline, and cross-borderline samples, and different oversampling mechanisms are designed for them, respectively. Meanwhile, it regards not only the minority classes but also the classes suffering from one-vs-rest imbalance as those in need of oversampling. Experiments on eight data sets with nine MLC algorithms and three base classifiers are carried out to compare MLBOTE with some state-of-art MLDs resampling techniques. The results show MLBOTE outperforms other methods in various scenarios.
Improved hybrid resampling and ensemble model for imbalance learning and credit evaluation
2022, Journal of Management Science and Engineering
Citation Excerpt :
Threshold-based methods generally proceed by setting the threshold values of different classes in different classifier learning stages, whereas single-class learning methods are performed by training the classifier using a training set that contains only a specific class (Ganganwar, 2012). Kernel and activation function-based transformation methods (Ding et al., 2018), fuzzy-based methods, and clustering and task decomposition strategies have been commonly used and combined with support vector machines (SVM) (Richhariya and Tanveer, 2020), decision trees (DTs), neural networks, K-nearest neighbor learning, extreme learning machines (Cheng et al., 2020; Liu et al., 2020), and rule-based classifiers (Guo et al., 2017). Other recent imbalanced learning-based approaches include the SMOTE-based class-specific extreme learning machine (Choudhary and Shukla, 2021), and Gaussian affinity for max-margin class imbalanced learning (Hayat et al., 2019), and AdaBalGAN (Wang et al., 2019).
A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the influence of the distance factor on the majority of instances, and the near-miss method omits the inter-class(es) within the majority of samples. To overcome these drawbacks, this study proposes an undersampling method combining distance measurement and majority class clustering. Resampling methods are used to develop an ensemble-based imbalanced-learning algorithm called the clustering and distance-based imbalance learning model (CDEILM). This algorithm combines distance-based undersampling, feature selection, and ensemble learning. In addition, a cluster size-based resampling (CSBR) method is proposed for preserving the original distribution of the majority class, and a hybrid imbalanced learning framework is constructed by fusing various types of resampling methods. The combination of CDEILM and CSBR can be considered as a specific case of this hybrid framework. The experimental results show that the CDEILM and CSBR methods can achieve better performance than the benchmark methods, and that the hybrid model provides the best results under most circumstances. Therefore, the proposed model can be used as an alternative imbalanced learning method under specific circumstances, e.g., for providing a solution to credit evaluation problems in financial applications.
Artificial intelligence based quality of transmission predictive model for cognitive optical networks
2022, Optik
Due to the advancements in 5 G technologies, high-definition, and the internet of things (IoT), the capacity demand of optical networks has been exponentially increased. Optical communication networks offer several metrics such as high transmission capacity, low transmission loss, better anti-interference, robustness, etc which offers new opportunities to the communication field. To satisfy the increasing demands of optical networks, effective network resource utilization become essential. So, it is needed to design proper planning tools with superior accuracy for quality of transmission (QoT) in optical networks. Recently, artificial intelligence (AI) techniques pose new opportunities for resolving these issues and machine learning (ML) algorithms offer better performance over the analytical approaches. With this motivation, this paper presents a novel AI based cognitive QoT prediction (AI-CQoT) model for optical communication networks. The proposed AI-CQoT model aims to predict the QoT for the quality of service (QoS) link setup using AI techniques with the transmission equation based synthetic data generation. The proposed model uses the Label weighting extreme learning machine (LW-ELM) model for the prediction process which includes a link and signal characteristics as input features. Besides, the LW-ELM model is trained by the use of transmission equations. For improving the predictive performance of the LW-ELM model, the parameters such as weight matrix $W$ and penalty factor $C$ are optimally tuned by the use of the shuffled shepherd optimization (SSO) algorithm. A detailed experimental validation is performed to highlight the improved performance of the AI-CQoT model and the results are investigated in terms of different performance measures.
Bat algorithm optimized extreme learning machine: A new modeling strategy for predicting river water turbidity at the United States
2022, Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques
Turbidity (TU) is one of the most important water quality variables and despite its great importance, the need to increase the number of monitoring stations is becoming a major issue for many regions of the world. In the absence of direct in situ measurement, alternative methods based on the different modeling approaches can be useful tools for predicting river TU. In this research, a novel modeling strategy for predicting river water turbidity using only measured river discharge (Q) has been proposed. Time series of river TU and Q measured at four United States Geological Survey (USGS) gauging stations located in a different region of the United States of America are employed for calibrating and testing several machine learning models. Specifically, four models were proposed and compared: the hybrid bat algorithm optimized extreme learning machine (Bat-ELM), the standalone feedforward artificial neural network (FFNN), the off-line dynamic evolving neural-fuzzy inference system called DENFIS_F, and the on-line dynamic evolving neural fuzzy inference system called DENFIS_O. The models were developed using the Q as input variables combined with the three components of the Gregorian calendar, i.e., the year, month, and day numbers. The models were evaluated using well-known performance evaluation metrics, i.e., coefficient of correlation (R), Nash-Sutcliffe efficiency (NSE), MAE, and RMSE. In general, the Bat-ELM performed best, more accurate than all other models, exhibiting the high R and NSE indexes with values ranging from 0.898 to 0.971 and from 0.806 to 0.936, respectively. It was also found that the inclusion of the periodicity contributed significantly to the improvement of the models performances. Furthermore, obtained results in the present study were encouraging and can be used a basis for future investigation related to river water turbidity modeling and forecasting.
A semi-supervised deep learning model for ship encounter situation classification
2021, Ocean Engineering
Maritime safety is an important issue for global shipping industries. Currently, most of collision accidents at sea are caused by the misjudgement of the ship’s operators. The deployment of maritime autonomous surface ships (MASS) can greatly reduce ships’ reliance on human operators by using an automated intelligent collision avoidance system to replace human decision-making. To successfully develop such a system, the capability of autonomously identifying other ships and evaluating their associated encountering situation is of paramount importance. In this paper, we aim to identify ships’ encounter situation modes using deep learning methods based upon the Automatic Identification System (AIS) data. First, a segmentation process is developed to divide each ship’s AIS data into different segments that contain only one encounter situation mode. This is different to the majority of studies that have proposed encounter situation mode classification using hand-crafted features, which may not reflect the actual ship’s movement states. Furthermore, a number of present classification tasks are conducted using substantial labelled AIS data followed by a supervised training paradigm, which is not applicable to our dataset as it contains a large number of unlabelled AIS data. Therefore, a method called Semi-Supervised Convolutional Encoder–Decoder Network (SCEDN) for ship encounter situation classification based on AIS data is proposed. The structure of the network is not only able to automatically extract features from AIS segments but also share training parameters for the unlabelled data. The SCEDN uses an encoder–decoder convolutional structure with four channels for each segment (distance, speed, Time to the Closed Point of Approach (TCPA) and Distance to the Closed Point of Approach (DCPA)) been developed. The performance of the SCEDN model are evaluated by comparing to several baselines with the experimental results demonstrating a higher accuracy can be achieved by our proposed model.
Online ensemble learning algorithm for imbalanced data stream
2021, Applied Soft Computing
Citation Excerpt :
Ensemble learning algorithms [1–6] changes the distribution of training data sets with a certain way to obtain multiple training subsets, and then constructs multiple base-classifiers to predict unknown data by voting with multiple base-classifiers. Boosting algorithm [7–11] and bagging algorithm [12–15] are the two most representative ensemble learning algorithms. But these two and their improved ensemble learning algorithms are designed for block learning and are not suitable for online data stream learning.
In many practical applications, due to the inability to collect complete training data sets at one time, the adaptability of the classifier is poor. Online ensemble learning can better solve this problem. However, most of the data streams are imbalanced. Imbalanced data stream will greatly affect the performance of online ensemble learning algorithm. To reduce the impact of imbalanced data stream, this paper proposes a cost sensitive online ensemble learning algorithm for imbalanced data stream. The algorithm uses a variety of equalization methods, mainly including the construction of initial base-classifier, dynamic calculation of misclassification cost, sampling method of samples in data stream and calculation of weight of base-classifier. Those methods can reduce the influence of imbalanced data stream and improve the classification performance under imbalanced data stream. The experimental results show that the performance of the proposed algorithm has the better classification performance for imbalanced data stream. Finally, the algorithm is applied to the network intrusion detection, and the simulation experiment on NSL-KDD data set can reduce the missing alarm rate and the false alarm rate. The experimental results show that the algorithm can improve the detection accuracy, especially the recognition rate of unknown intrusion behavior.

View all citing articles on Scopus

Ke Cheng was born in Chizhou, Anhui, China, in 1972. He received the B.S. degree in mining engineering from Hunan University of Science and Technology, Xiangtan, China, in 1996, and received M.S. degree in agricultural process engineering from Jiangsu University, Zhenjiang, China, in 1999, and Ph.D. degree in computer science from Nanjing University of Science and Technology, Nanjing, China, in 2006. Since 2008, he has been an Associate Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. From 2002 to 2003, he was a senior visiting scholar in School of Computer Science, Southeast University, Nanjing, China. He has authored or co-authored more than 20 research papers. His research interests include machine learning, data mining and bioinformatics.

Shang Gao received his B.S. degree in System Engineering from Air Force Engineering University, Xi'an, China, in 1993, the M.S. degree in Military Equipment from Air-Force Engineering University, Xi'an, China, in 1996, and the Ph.D. degree in Pattern Recognition and Intelligent Systems from Nanjing University of Science and Technology, Nanjing, China, in 2006, respectively.

Since 2009, he has been an Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. Since 2017, he has been the dean of School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. He has published over 80 research articles on the professional journals and conferences. His research interests include Optimization Theory, Swarm Intelligence and Machine Learning.

Wenlu Dong was born in Henan, China, in 1995. She received the B.S. degree in Computer Science and Technology from Henan Institute of Engineering, Zhengzhou, in 2018.

Since 2018, she is working toward the M.S. degree in the School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. Her research interests mainly include machine learning and data mining.

Xibei Yang received the B.S. degree in computer sciences from Xuzhou Normal University, Xuzhou, China, in 2002, the M.S. degree in computer applications from Jiangsu University of Science and Technology, Zhenjiang, China, in 2006 and the Ph.D. degree in Pattern Recognition and Intelligence System from Nanjing University of Science and Technology, Nanjing, China, in 2010. Since 2018, he has been a Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. He has published over 100 research articles on the professional journals and conferences. His research interests include granular computing and rough set theory. Dr. Yang is the reviewer for over 10 high-quality international journals, and the member in the organizing committee of several international conferences.

Qi Wang received his M.S. degree in electrical and computer engineering from Sungkyunkwan University, South Korea in 2011, and Ph.D. degree in information and communications technology from University of Trento, Italy in 2015, respectively. He was a visiting scholar at North Carolina State University from 2013 to 2014.

Dr. Wang is currently a lecturer with school of computer, Jiangsu University of Science and Technology, Zhenjiang, China. His research interests include wireless sensor network, wireless communications in smart grid, architecture reliability of smart grid with renewable energy system, and fast power charging station for electric vehicles.

Hualong Yu was born in Harbin, China, in 1982. He received the B.S. degree in computer science from Heilongjiang University, Harbin, China, in 2005, and received M.S. and Ph.D. degrees in computer science from Harbin Engineering University, Harbin, China, in 2008 and 2010, respectively. Since 2010, he has been an Associate Professor in School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China. From 2013 to 2017, he was a Post-Doctoral Fellow in School of Automation, Southeast University, Nanjing, China. From 2017 to 2018, he was a senior visiting scholar in Faculty of Information Technology, Monash University, Melbourne, Australia. He has authored or co-authored more than 70 journal and conference papers, and 4 monographs, including publications on IEEE TNNLS, IEEE TFS, IEEE TCBB, IEEE Access, Information Science, KBS, Neurocomputing, etc. His research interests include machine learning, data mining and bioinformatics. Dr. Yu is an Associate Editor of IEEE Access, and an active reviewer for more than 20 high-quality international journals, including IEEE TNNLS, TCYB, TKDE, TCBB and ACM TKDD etc., and the member in the organizing committee of several international conferences. He is also the member of ACM, China Computer Federation (CCF) and the Youth Committee of the Chinese Association of Automation (CAA).

View full text

Boosting label weighted extreme learning machine for classifying multi-label imbalanced data

Highlights

Abstract

Introduction

Section snippets

What is class imbalance in multi-label data?

Extreme learning machine

The description about the data sets

Experimental settings

Concluding remarks

Acknowledgements

Data Knowl. Eng.

Expert Syst. Appl.

Neurocomputing

Neurocomputing

Inf. Sci.

Decis. Support Syst.

Fuzzy Sets Syst.

Neural Networks

Neurocomputing

Knowl.-Based Syst.

Knowl.-Based Syst.

Pattern Recogn.

Pattern Recogn.

Knowl.-Based Syst.

Pattern Recogn.

Neurocomputing

Neurocomputing

Neurocomputing

Neural Networks

Inf. Sci.

A review on multi-label learning algorithms

IEEE Trans. Knowl. Data Eng.

Editing training data for multi-label classification with the k-nearest neighbor rule

Pattern Anal. Appl.

Toward non-intrusive load monitoring via multi-label classification

IEEE Trans. Smart Grid