A noise-detection based AdaBoost algorithm for mislabeled data
Highlights
► A noise-detection based loss function is proposed to distinguish mislabeled data from misclassified instances. ► A new regeneration condition is added to guarantee the generalization performance of the proposed algorithm. ► The proposed methods significantly outperform most of the state-of-the-art methods in high noise region.
Introduction
AdaBoost [1], [2], [3] is one of the most popular techniques for generating ensembles due to its adaptability and simplicity. In the past few decades, AdaBoost has been successfully extended to many fields such as cost-sensitive classification [4], [5], semi-supervised learning [6], tracking [7] and network intrusion detection [8]. The main idea of AdaBoost is to construct a succession of weak learners by using different training sets that are derived from resampling the original data. Through a weighted vote, these learners are combined to predict the class label of a new test instance. Normally, the performance of a weak learner is slightly better than random guessing [9]. The weak learner that used in the ensemble is named as base classifier or component classifier.
However, AdaBoost tends to be overfitting when the number of combined classifiers increases. Some researchers attributed this failure of AdaBoost to the high proportion of noisy instances [10], [11]. In [10], Rätsch et al. defined three conditions to identify noisy data: (1) overlapping class probability distributions, (2) outliers and (3) mislabeled instances. It should be noted that our work only discusses noisy instances with mislabeled property. Mislabeled instances typically refer to those instances inconsistent with most of their surrounding neighbors' class labels. Dietterich [11] designed an experimental test to prove the poor generalization of AdaBoost with C4.5 by adding artificial noise. He explained that the mislabeled instances would possibly be assigned to higher weights, which gave rise to unsatisfactory performance of AdaBoost.
By analyzing the inner impelling force of AdaBoost, one may notice that essentially it aims to minimize an exponential loss function [12] sequentially. In detail, it puts emphasis on penalizing misclassified instances by giving incremental weights whereas assigning lessened weights to correctly classified instances for the next iteration. In this way, AdaBoost will only focus on punishing the misclassified instances whereas ignore their mislabeled property, which leads to the noise sensitivity of AdaBoost.
Therefore, in this paper, a noise-detection based AdaBoost algorithm (ND-AdaBoost), associated with the mislabeled properties of instances, is proposed to address the noise sensitivity and overfitting problem. The main contributions of this paper are as follows.
(1) Four types of instances with respect to noise and class label decisions, which are different from conventional concern on taxonomy of misclassified and correctly classified instances, are introduced. More specifically, they are correctly classified noisy instances, misclassified noisy instances, misclassified non-noisy instances and correctly classified non-noisy instances. This division is in line with the assumption that the probability of mislabeled instances being misclassified should be as high as possible, while the correctly labeled instances are expected to be classified correctly.
(2) A revised exponential loss function is proposed by considering these types of instances. At each iteration, a noise label determined by a noise detection function is assigned to each instance. With the new loss function, we aim to minimize the optimization objective by assigning less weights to misclassified noisy instances and correctly classified non-noisy instances. To identify noisy data, both EM and k-NN based functions are employed to test noise labeling effects under different detection methods.
(3) In order to guarantee the generalization ability of the proposed method, a new regeneration condition based on the analysis of empirical margin error bound of ND-AdaBoost is developed, so as to control the bound of the proposed algorithm within a reasonable range.
The performance of noise-detection based AdaBoost algorithm is examined through experiments on 13 binary data sets from UCI repository [13]. Experimental results show that the proposed algorithm outperforms other boosting methods under noisy environment.
Section snippets
Related work
For decades, researchers have made different modifications on AdaBoost technique to handle its noisy detrimental effect through two directions: (1) revising the optimization objective (loss function) and rebuilding the weight updating mechanism according to the corresponding loss function; (2) limiting the incremental weight update of the noisy instance or discarding them directly. Through these methods, the disturbance originated from mistrust instances could be minimized, and the noise
Framework of AdaBoost algorithm
Since our work is an extension of AdaBoost approach, preliminary knowledge of AdaBoost is firstly introduced in this section. Suppose we have a two class supervised learning task. Let the n-th training instance denoted as , and yn is class label of xn. Let H be a set of component classifiers: . The pseudo-code of basic AdaBoost algorithm for binary classification is presented in Algorithm 1. Algorithm 1 AdaBoost. Input: A set of labeled training instances:
Noise-detection based AdaBoost
Before introducing our proposed algorithm, two important assumptions that related to the noise issue are presented as follows:
Assumption 1: An optimal classifier is supposed to be able to correctly classify the clean instances, i.e., non-noisy instances.
Assumption 2: An optimal classifier is supposed to be able to misclassify the mislabeled instances, i.e., noisy instances.
Experiments
The motivation of ND-AdaBoost is to improve noise tolerance of conventional AdaBoost. Therefore, our experiments are primarily focusing on comparing classification accuracy/error at different noise levels. Two groups of experiments are examined to show the effectiveness of ND-AdaBoost algorithm in different aspects. The first group of experiments (Section 5.1) demonstrates the superiority of the noise-detection based method to benchmark algorithms in the GML AdaBoost Matlab Toolbox [34] with
Conclusion
To solve the incompatibility between AdaBoost algorithm and noisy instances, this paper designs a noise-detection version of boosting algorithm. It labels the noisy instances at each iteration and adds a regeneration condition to control the ensemble training error bound. The experimental results with artificially added mislabeled instances show that different noise identification functions (k-NN and EM based) affect the performance of ND-AdaBoost algorithm, i.e., the testing result of k-NN
Acknowledgements
This work is supported by City University Grant 9610025 and City University Strategic Grant 7002680. The authors would like to acknowledge the two anonymous reviewers' comments, which contributed to improve the quality of our paper significantly.
Jingjing Cao received her B.S. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.S. degree in Applied Mathematics from the same university in 2008. She is currently doing Ph.D. in the Department of Computer Science at City University of Hong Kong, Kowloon, Hong Kong. Her research interests are in Ensemble Learning, Evolutionary Algorithms, and their applications.
References (41)
- et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
(2007) Supervised projection approach for boosting classifiers
Pattern Recognition
(2009)- et al.
Constructing ensembles of classifiers using supervised projection methods based on misclassified instances
Expert Systems with Applications
(2011) - et al.
Edited adaBoost by weighted kNN
Neurocomputing
(2010) - et al.
Boosting random subspace method
Neural Networks
(2008) - et al.
Reduced reward-punishment editing for building ensembles of classifiers
Expert Systems with Applications
(2011) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) - et al.
An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes
Pattern Recognition
(2011) - Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: ICML, 1996, pp....
- J. Quinlan, Bagging, boosting, and C4. 5, in: Proceedings of the National Conference on Artificial Intelligence, 1996,...
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences
Cost-sensitive boosting
IEEE Transactions on Pattern Analysis and Machine Intelligence
SemiBoost: boosting for semi-supervised learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ensemble tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Adaboost-based algorithm for network intrusion detection
IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics
The strength of weak learnability
Machine Learning
Soft margins for adaBoost
Machine Learning
An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization
Machine Learning
Cited by (109)
Bounded exponential loss function based AdaBoost ensemble of OCSVMs
2024, Pattern RecognitionConsumer credit risk assessment: A review from the state-of-the-art classification algorithms, data traits, and learning methods
2024, Expert Systems with ApplicationsUnderstanding geometrical size effect on fatigue life of A588 steel using a machine learning approach
2023, International Journal of FatigueNovel and robust machine learning model to optimize biodiesel production from algal oil using CaO and CaO/Al<inf>2</inf>O<inf>3</inf> as catalyst: Sustainable green energy
2023, Environmental Technology and InnovationLeveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification
2023, Computers in Biology and MedicineImproved AdaBoost algorithm using misclassified samples oriented feature selection and weighted non-negative matrix factorization
2022, NeurocomputingCitation Excerpt :Although this approach tries to reduce the harmful effect of noises, it does not explicitly define which and how many examples in the original training set should be considered as mislabeled noises. For solving this problem, Cao proposed a noise detection based AdaBoost algorithm (ND_AdaBoost) which first divides all the training samples into two categories: the non-noisy samples and the mislabeled noisy sampes, and then assigns different weights to the samples according to their categories [26]. Experimental results demonstrate that ND_AdaBoost is more robust than AdaBoost on training sets with mislabeled noises.
Jingjing Cao received her B.S. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.S. degree in Applied Mathematics from the same university in 2008. She is currently doing Ph.D. in the Department of Computer Science at City University of Hong Kong, Kowloon, Hong Kong. Her research interests are in Ensemble Learning, Evolutionary Algorithms, and their applications.
Sam Kwong received his BSc degree and MASc degree in electrical engineering from the State University of New York at Buffalo, USA, and University of Waterloo, Canada, in 1983 and 1985, respectively. In 1996, he later obtained his PhD from the University of Hagen, Germany. From 1985 to 1987, he was a diagnostic engineer with the Control Data Canada where he designed the diagnostic software to detect the manufacture faults of the VLSI chips in the Cyber 430 machine. He later joined the Bell Northern Research Canada as a Member of Scientific staff. In 1990, he joined the City University of Hong Kong as a lecturer in the Department of Electronic Engineering. He is currently a Professor in the Department of Computer Science. His research interests are in Pattern Recognition, Evolutionary Algorithms and Video Coding.
Ran Wang received her Bachelor's degree from School of Information Science & Technology, Beijing Forestry University, Beijing, China, in 2009. She is currently a Ph.D candidate in the Department of Computer Science, City University of Hong Kong. Her research interests focus on machine learning and its related applications.