A noise-detection based AdaBoost algorithm for mislabeled data

doi:10.1016/j.patcog.2012.05.002

Pattern Recognition

Volume 45, Issue 12, December 2012, Pages 4451-4465

https://doi.org/10.1016/j.patcog.2012.05.002 Get rights and content

Abstract

Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets.

Highlights

► A noise-detection based loss function is proposed to distinguish mislabeled data from misclassified instances. ► A new regeneration condition is added to guarantee the generalization performance of the proposed algorithm. ► The proposed methods significantly outperform most of the state-of-the-art methods in high noise region.

Introduction

AdaBoost [1], [2], [3] is one of the most popular techniques for generating ensembles due to its adaptability and simplicity. In the past few decades, AdaBoost has been successfully extended to many fields such as cost-sensitive classification [4], [5], semi-supervised learning [6], tracking [7] and network intrusion detection [8]. The main idea of AdaBoost is to construct a succession of weak learners by using different training sets that are derived from resampling the original data. Through a weighted vote, these learners are combined to predict the class label of a new test instance. Normally, the performance of a weak learner is slightly better than random guessing [9]. The weak learner that used in the ensemble is named as base classifier or component classifier.

However, AdaBoost tends to be overfitting when the number of combined classifiers increases. Some researchers attributed this failure of AdaBoost to the high proportion of noisy instances [10], [11]. In [10], Rätsch et al. defined three conditions to identify noisy data: (1) overlapping class probability distributions, (2) outliers and (3) mislabeled instances. It should be noted that our work only discusses noisy instances with mislabeled property. Mislabeled instances typically refer to those instances inconsistent with most of their surrounding neighbors' class labels. Dietterich [11] designed an experimental test to prove the poor generalization of AdaBoost with C4.5 by adding artificial noise. He explained that the mislabeled instances would possibly be assigned to higher weights, which gave rise to unsatisfactory performance of AdaBoost.

By analyzing the inner impelling force of AdaBoost, one may notice that essentially it aims to minimize an exponential loss function [12] sequentially. In detail, it puts emphasis on penalizing misclassified instances by giving incremental weights whereas assigning lessened weights to correctly classified instances for the next iteration. In this way, AdaBoost will only focus on punishing the misclassified instances whereas ignore their mislabeled property, which leads to the noise sensitivity of AdaBoost.

Therefore, in this paper, a noise-detection based AdaBoost algorithm (ND-AdaBoost), associated with the mislabeled properties of instances, is proposed to address the noise sensitivity and overfitting problem. The main contributions of this paper are as follows.

(1) Four types of instances with respect to noise and class label decisions, which are different from conventional concern on taxonomy of misclassified and correctly classified instances, are introduced. More specifically, they are correctly classified noisy instances, misclassified noisy instances, misclassified non-noisy instances and correctly classified non-noisy instances. This division is in line with the assumption that the probability of mislabeled instances being misclassified should be as high as possible, while the correctly labeled instances are expected to be classified correctly.

(2) A revised exponential loss function is proposed by considering these types of instances. At each iteration, a noise label determined by a noise detection function is assigned to each instance. With the new loss function, we aim to minimize the optimization objective by assigning less weights to misclassified noisy instances and correctly classified non-noisy instances. To identify noisy data, both EM and k-NN based functions are employed to test noise labeling effects under different detection methods.

(3) In order to guarantee the generalization ability of the proposed method, a new regeneration condition based on the analysis of empirical margin error bound of ND-AdaBoost is developed, so as to control the bound of the proposed algorithm within a reasonable range.

The performance of noise-detection based AdaBoost algorithm is examined through experiments on 13 binary data sets from UCI repository [13]. Experimental results show that the proposed algorithm outperforms other boosting methods under noisy environment.

Section snippets

Related work

For decades, researchers have made different modifications on AdaBoost technique to handle its noisy detrimental effect through two directions: (1) revising the optimization objective (loss function) and rebuilding the weight updating mechanism according to the corresponding loss function; (2) limiting the incremental weight update of the noisy instance or discarding them directly. Through these methods, the disturbance originated from mistrust instances could be minimized, and the noise

Framework of AdaBoost algorithm

Since our work is an extension of AdaBoost approach, preliminary knowledge of AdaBoost is firstly introduced in this section. Suppose we have a two class supervised learning task. Let the n-th training instance denoted as $z_{n} = (x_{n}, y_{n}), n = 1, \dots, N$ , $x_{n} \in X$ and y_n is class label of x_n. Let H be a set of component classifiers: $H = {h_{t} (x_{n}) : X \to {- 1, + 1}, t = 1, \dots, T}$ . The pseudo-code of basic AdaBoost algorithm for binary classification is presented in Algorithm 1.

Algorithm 1

AdaBoost.

Input: A set of labeled training instances:
$S$

Noise-detection based AdaBoost

Before introducing our proposed algorithm, two important assumptions that related to the noise issue are presented as follows:

$•$
Assumption 1: An optimal classifier is supposed to be able to correctly classify the clean instances, i.e., non-noisy instances.
$•$
Assumption 2: An optimal classifier is supposed to be able to misclassify the mislabeled instances, i.e., noisy instances.

Experiments

The motivation of ND-AdaBoost is to improve noise tolerance of conventional AdaBoost. Therefore, our experiments are primarily focusing on comparing classification accuracy/error at different noise levels. Two groups of experiments are examined to show the effectiveness of ND-AdaBoost algorithm in different aspects. The first group of experiments (Section 5.1) demonstrates the superiority of the noise-detection based method to benchmark algorithms in the GML AdaBoost Matlab Toolbox [34] with

Conclusion

To solve the incompatibility between AdaBoost algorithm and noisy instances, this paper designs a noise-detection version of boosting algorithm. It labels the noisy instances at each iteration and adds a regeneration condition to control the ensemble training error bound. The experimental results with artificially added mislabeled instances show that different noise identification functions (k-NN and EM based) affect the performance of ND-AdaBoost algorithm, i.e., the testing result of k-NN

Acknowledgements

This work is supported by City University Grant 9610025 and City University Strategic Grant 7002680. The authors would like to acknowledge the two anonymous reviewers' comments, which contributed to improve the quality of our paper significantly.

Jingjing Cao received her B.S. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.S. degree in Applied Mathematics from the same university in 2008. She is currently doing Ph.D. in the Department of Computer Science at City University of Hong Kong, Kowloon, Hong Kong. Her research interests are in Ensemble Learning, Evolutionary Algorithms, and their applications.

References (41)

Y. Sun et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
(2007)
N. García-Pedrajas
Supervised projection approach for boosting classifiers
Pattern Recognition
(2009)
N. García-Pedrajas et al.
Constructing ensembles of classifiers using supervised projection methods based on misclassified instances
Expert Systems with Applications
(2011)
Y. Gao et al.
Edited adaBoost by weighted kNN
Neurocomputing
(2010)
N. García-Pedrajas et al.
Boosting random subspace method
Neural Networks
(2008)
L. Nanni et al.
Reduced reward-punishment editing for building ensembles of classifiers
Expert Systems with Applications
(2011)
G. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
M. Galar et al.
An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes
Pattern Recognition
(2011)
Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: ICML, 1996, pp....
J. Quinlan, Bagging, boosting, and C4. 5, in: Proceedings of the National Conference on Artificial Intelligence, 1996,...

F. Yoav et al.

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences

(1997)

H. Masnadi-Shirazi et al.

Cost-sensitive boosting

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2010)

P.K. Mallapragada et al.

SemiBoost: boosting for semi-supervised learning

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2009)

S. Avidan

Ensemble tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2007)

W. Hu et al.

Adaboost-based algorithm for network intrusion detection

IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics

(2008)

R. Schapire

The strength of weak learnability

Machine Learning

(1990)

G. Rätsch et al.

Soft margins for adaBoost

Machine Learning

(2001)

T. Dietterich

An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization

Machine Learning

(2000)

C. Bishop

(2006)

A. Frank, A. Asuncion, UCI Machine Learning Repository 〈http://archive.ics.uci.edu/ml〉,...

Cited by (109)

Bounded exponential loss function based AdaBoost ensemble of OCSVMs
2024, Pattern Recognition
As a commonly used ensemble method, AdaBoost has drawn much consideration in the field of machine learning. However, AdaBoost is highly sensitive to outliers. The performance of AdaBoost may be greatly deteriorated when the training samples are polluted by outliers. For binary and multi-class classifications, there have emerged many approaches to improving the robustness of AdaBoost against outliers. Unfortunately, there are too few researches on enhancing the robustness of AdaBoost against outliers in the case of one-class classification. In this study, the exponential loss function of AdaBoost is replaced by a more robust one to improve the anti-outlier ability of the conventional AdaBoost based ensemble of one-class support vector machines (OCSVMs). Furthermore, based on the redesigned loss function, the update formulae for the weights of base classifiers and the probability distribution of training samples are reformulated towards the AdaBoost ensemble of OCSVMs. The empirical error upper bound is derived from the theoretical viewpoint. Experimental outcomes upon the artificial and benchmark data sets show that the presented ensemble approach is more robust against outliers than its related methods.
Consumer credit risk assessment: A review from the state-of-the-art classification algorithms, data traits, and learning methods
2024, Expert Systems with Applications
Credit risk assessment is a crucial element in credit risk management. With the extensive research on consumer credit risk assessment in recent decades, the abundance of literature on this topic can be overwhelming for researchers. Therefore, this article aims to provide a more systematic and comprehensive analysis from three perspectives: classification algorithms, data traits, and learning methods. Firstly, the state-of-the-art classification algorithms are categorized into traditional single classifiers, intelligent single classifiers, hybrid and ensemble multiple classifiers. Secondly, considering the diversity of data traits in the credit dataset, data traits are divided into external structure information traits, data quality traits, data quantity traits, and internal information traits. Data traits-driven modeling framework based on multiple classifiers is proposed for solving credit risk assessment. Thirdly, considering the differences in data modeling methods, learning methods are classified into data status, label status, and structure form. Furthermore, model interpretability, model bias, model multi-pattern, and model fairness are discussed. Finally, the limitations and future research directions are presented. This review article serves as a helpful guide for researchers and practitioners in the field of credit risk modeling and analysis.
Understanding geometrical size effect on fatigue life of A588 steel using a machine learning approach
2023, International Journal of Fatigue
In this paper, both experimental and machine learning results show that the fatigue life of A588 steel specimens with different gauge lengths and widths varies more greatly compared with that of the specimens with different thicknesses as the gauge dimensions are reduced from 15 mm to 1.5 mm. The optimal machine learning algorithm is derived to predict the fatigue life of specimens with a thickness of 1 mm, and the predicted results are verified by the fatigue experiments.
Novel and robust machine learning model to optimize biodiesel production from algal oil using CaO and CaO/Al<inf>2</inf>O<inf>3</inf> as catalyst: Sustainable green energy
2023, Environmental Technology and Innovation
In recent years, significant endeavors have been made to develop environmentally friendly fuels due to the detrimental effects of fossil fuels on ecosystem such as global warming, acid rain and air pollution. Alternative biodiesel transport fuel (ABTF) has shown their noteworthy potential of application due to having significant advantages such as negligible toxicity and excellent biodegradability. In this paper, Neural Network-based approaches were employed to create predictions in this work, including Multilayer Perceptron, Boosted Multilayer Perceptron, and Bagging Multilayer Perceptron. The regression issue has three input features: Reaction duration, catalyst amount, and methanol/oil ratio, and the only output is FAME yield. All three versions of these neural network models were tuned using their critical hyper-parameters and chose the optimal mix. Then, some standard measures are used to evaluate their performance. Multilayer perceptron, Boosted Multilayer perceptron, and Bagging Multilayer perceptron has error rates of 0.998, 0.998, and 0.877, respectively, and have MSE errors of 2.87, 1.19, and 5.57. Additionally, considering the MAPE 1.51E−02, 1.09E−02, and 2.26E−02 values acquired. Finally, the boosted multilayer perceptron is the most general and accurate model. Additionally, the optimal output value is 98.99 when the input vector is (x1 $=$ 158, x2 $=$ 1.25, x3 $=$ 33.75).
Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification
2023, Computers in Biology and Medicine
With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.
Improved AdaBoost algorithm using misclassified samples oriented feature selection and weighted non-negative matrix factorization
2022, Neurocomputing
Citation Excerpt :
Although this approach tries to reduce the harmful effect of noises, it does not explicitly define which and how many examples in the original training set should be considered as mislabeled noises. For solving this problem, Cao proposed a noise detection based AdaBoost algorithm (ND_AdaBoost) which first divides all the training samples into two categories: the non-noisy samples and the mislabeled noisy sampes, and then assigns different weights to the samples according to their categories [26]. Experimental results demonstrate that ND_AdaBoost is more robust than AdaBoost on training sets with mislabeled noises.
To improve the classification performance of existing adaptive boosting (AdaBoost) based algorithms effectively, an improved AdaBoost algorithm based on misclassified samples oriented feature selection and weighted non-negative matrix factorization (WNMF) is proposed in this paper. Firstly, in order to consider the effects of sample weights, a misclassified samples oriented feature selection (called MOFS) is proposed to select the most discriminative features which occur in the samples with high weights. Secondly, the explicit features and the part-based features of the training samples are both considered, and the WNMF algorithm is introduced and combined with MOFS to reduce the dimension of the training sample set. Finally, the concept of misclassification degree is introduced and a fine grained sample weight updating method is proposed to distinguish the samples with different misclassification degrees. Numerical experiments show that the proposed MOFS method achieves higher accuracy compared to traditional feature selection methods, and the proposed MOFS and WNMF based AdaBoost method obtains significant improvement on classification accuracy when comparing with typical existing AdaBoost based algorithms using different classifiers.

View all citing articles on Scopus

Sam Kwong received his BSc degree and MASc degree in electrical engineering from the State University of New York at Buffalo, USA, and University of Waterloo, Canada, in 1983 and 1985, respectively. In 1996, he later obtained his PhD from the University of Hagen, Germany. From 1985 to 1987, he was a diagnostic engineer with the Control Data Canada where he designed the diagnostic software to detect the manufacture faults of the VLSI chips in the Cyber 430 machine. He later joined the Bell Northern Research Canada as a Member of Scientific staff. In 1990, he joined the City University of Hong Kong as a lecturer in the Department of Electronic Engineering. He is currently a Professor in the Department of Computer Science. His research interests are in Pattern Recognition, Evolutionary Algorithms and Video Coding.

Ran Wang received her Bachelor's degree from School of Information Science & Technology, Beijing Forestry University, Beijing, China, in 2009. She is currently a Ph.D candidate in the Department of Computer Science, City University of Hong Kong. Her research interests focus on machine learning and its related applications.

View full text

A noise-detection based AdaBoost algorithm for mislabeled data

Abstract

Highlights

Introduction

Section snippets

Related work

Framework of AdaBoost algorithm

Noise-detection based AdaBoost

Experiments

Conclusion

Acknowledgements

Pattern Recognition

Pattern Recognition

Expert Systems with Applications

Neurocomputing

Neural Networks

Expert Systems with Applications

Neurocomputing

Pattern Recognition

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences

Cost-sensitive boosting

IEEE Transactions on Pattern Analysis and Machine Intelligence

SemiBoost: boosting for semi-supervised learning

IEEE Transactions on Pattern Analysis and Machine Intelligence

Ensemble tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence

Adaboost-based algorithm for network intrusion detection

IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics

The strength of weak learnability

Machine Learning

Soft margins for adaBoost

Machine Learning

An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization

Machine Learning