Elsevier

Pattern Recognition

Volume 45, Issue 12, December 2012, Pages 4451-4465
Pattern Recognition

A noise-detection based AdaBoost algorithm for mislabeled data

https://doi.org/10.1016/j.patcog.2012.05.002Get rights and content

Abstract

Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets.

Highlights

► A noise-detection based loss function is proposed to distinguish mislabeled data from misclassified instances. ► A new regeneration condition is added to guarantee the generalization performance of the proposed algorithm. ► The proposed methods significantly outperform most of the state-of-the-art methods in high noise region.

Introduction

AdaBoost [1], [2], [3] is one of the most popular techniques for generating ensembles due to its adaptability and simplicity. In the past few decades, AdaBoost has been successfully extended to many fields such as cost-sensitive classification [4], [5], semi-supervised learning [6], tracking [7] and network intrusion detection [8]. The main idea of AdaBoost is to construct a succession of weak learners by using different training sets that are derived from resampling the original data. Through a weighted vote, these learners are combined to predict the class label of a new test instance. Normally, the performance of a weak learner is slightly better than random guessing [9]. The weak learner that used in the ensemble is named as base classifier or component classifier.

However, AdaBoost tends to be overfitting when the number of combined classifiers increases. Some researchers attributed this failure of AdaBoost to the high proportion of noisy instances [10], [11]. In [10], Rätsch et al. defined three conditions to identify noisy data: (1) overlapping class probability distributions, (2) outliers and (3) mislabeled instances. It should be noted that our work only discusses noisy instances with mislabeled property. Mislabeled instances typically refer to those instances inconsistent with most of their surrounding neighbors' class labels. Dietterich [11] designed an experimental test to prove the poor generalization of AdaBoost with C4.5 by adding artificial noise. He explained that the mislabeled instances would possibly be assigned to higher weights, which gave rise to unsatisfactory performance of AdaBoost.

By analyzing the inner impelling force of AdaBoost, one may notice that essentially it aims to minimize an exponential loss function [12] sequentially. In detail, it puts emphasis on penalizing misclassified instances by giving incremental weights whereas assigning lessened weights to correctly classified instances for the next iteration. In this way, AdaBoost will only focus on punishing the misclassified instances whereas ignore their mislabeled property, which leads to the noise sensitivity of AdaBoost.

Therefore, in this paper, a noise-detection based AdaBoost algorithm (ND-AdaBoost), associated with the mislabeled properties of instances, is proposed to address the noise sensitivity and overfitting problem. The main contributions of this paper are as follows.

(1) Four types of instances with respect to noise and class label decisions, which are different from conventional concern on taxonomy of misclassified and correctly classified instances, are introduced. More specifically, they are correctly classified noisy instances, misclassified noisy instances, misclassified non-noisy instances and correctly classified non-noisy instances. This division is in line with the assumption that the probability of mislabeled instances being misclassified should be as high as possible, while the correctly labeled instances are expected to be classified correctly.

(2) A revised exponential loss function is proposed by considering these types of instances. At each iteration, a noise label determined by a noise detection function is assigned to each instance. With the new loss function, we aim to minimize the optimization objective by assigning less weights to misclassified noisy instances and correctly classified non-noisy instances. To identify noisy data, both EM and k-NN based functions are employed to test noise labeling effects under different detection methods.

(3) In order to guarantee the generalization ability of the proposed method, a new regeneration condition based on the analysis of empirical margin error bound of ND-AdaBoost is developed, so as to control the bound of the proposed algorithm within a reasonable range.

The performance of noise-detection based AdaBoost algorithm is examined through experiments on 13 binary data sets from UCI repository [13]. Experimental results show that the proposed algorithm outperforms other boosting methods under noisy environment.

Section snippets

Related work

For decades, researchers have made different modifications on AdaBoost technique to handle its noisy detrimental effect through two directions: (1) revising the optimization objective (loss function) and rebuilding the weight updating mechanism according to the corresponding loss function; (2) limiting the incremental weight update of the noisy instance or discarding them directly. Through these methods, the disturbance originated from mistrust instances could be minimized, and the noise

Framework of AdaBoost algorithm

Since our work is an extension of AdaBoost approach, preliminary knowledge of AdaBoost is firstly introduced in this section. Suppose we have a two class supervised learning task. Let the n-th training instance denoted as zn=(xn,yn),n=1,,N, xnX and yn is class label of xn. Let H be a set of component classifiers: H={ht(xn):X{1,+1},t=1,,T}. The pseudo-code of basic AdaBoost algorithm for binary classification is presented in Algorithm 1.

Algorithm 1

AdaBoost.

Input: A set of labeled training instances:
 S

Noise-detection based AdaBoost

Before introducing our proposed algorithm, two important assumptions that related to the noise issue are presented as follows:

  • Assumption 1: An optimal classifier is supposed to be able to correctly classify the clean instances, i.e., non-noisy instances.

  • Assumption 2: An optimal classifier is supposed to be able to misclassify the mislabeled instances, i.e., noisy instances.

Experiments

The motivation of ND-AdaBoost is to improve noise tolerance of conventional AdaBoost. Therefore, our experiments are primarily focusing on comparing classification accuracy/error at different noise levels. Two groups of experiments are examined to show the effectiveness of ND-AdaBoost algorithm in different aspects. The first group of experiments (Section 5.1) demonstrates the superiority of the noise-detection based method to benchmark algorithms in the GML AdaBoost Matlab Toolbox [34] with

Conclusion

To solve the incompatibility between AdaBoost algorithm and noisy instances, this paper designs a noise-detection version of boosting algorithm. It labels the noisy instances at each iteration and adds a regeneration condition to control the ensemble training error bound. The experimental results with artificially added mislabeled instances show that different noise identification functions (k-NN and EM based) affect the performance of ND-AdaBoost algorithm, i.e., the testing result of k-NN

Acknowledgements

This work is supported by City University Grant 9610025 and City University Strategic Grant 7002680. The authors would like to acknowledge the two anonymous reviewers' comments, which contributed to improve the quality of our paper significantly.

Jingjing Cao received her B.S. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.S. degree in Applied Mathematics from the same university in 2008. She is currently doing Ph.D. in the Department of Computer Science at City University of Hong Kong, Kowloon, Hong Kong. Her research interests are in Ensemble Learning, Evolutionary Algorithms, and their applications.

References (41)

  • F. Yoav et al.

    A decision-theoretic generalization of on-line learning and an application to boosting

    Journal of Computer and System Sciences

    (1997)
  • H. Masnadi-Shirazi et al.

    Cost-sensitive boosting

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • P.K. Mallapragada et al.

    SemiBoost: boosting for semi-supervised learning

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • S. Avidan

    Ensemble tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2007)
  • W. Hu et al.

    Adaboost-based algorithm for network intrusion detection

    IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics

    (2008)
  • R. Schapire

    The strength of weak learnability

    Machine Learning

    (1990)
  • G. Rätsch et al.

    Soft margins for adaBoost

    Machine Learning

    (2001)
  • T. Dietterich

    An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization

    Machine Learning

    (2000)
  • C. Bishop
    (2006)
  • A. Frank, A. Asuncion, UCI Machine Learning Repository 〈http://archive.ics.uci.edu/ml〉,...
  • Cited by (109)

    • Improved AdaBoost algorithm using misclassified samples oriented feature selection and weighted non-negative matrix factorization

      2022, Neurocomputing
      Citation Excerpt :

      Although this approach tries to reduce the harmful effect of noises, it does not explicitly define which and how many examples in the original training set should be considered as mislabeled noises. For solving this problem, Cao proposed a noise detection based AdaBoost algorithm (ND_AdaBoost) which first divides all the training samples into two categories: the non-noisy samples and the mislabeled noisy sampes, and then assigns different weights to the samples according to their categories [26]. Experimental results demonstrate that ND_AdaBoost is more robust than AdaBoost on training sets with mislabeled noises.

    View all citing articles on Scopus

    Jingjing Cao received her B.S. degree in Information and Computing Science, Dalian Maritime University, China in 2006 and the M.S. degree in Applied Mathematics from the same university in 2008. She is currently doing Ph.D. in the Department of Computer Science at City University of Hong Kong, Kowloon, Hong Kong. Her research interests are in Ensemble Learning, Evolutionary Algorithms, and their applications.

    Sam Kwong received his BSc degree and MASc degree in electrical engineering from the State University of New York at Buffalo, USA, and University of Waterloo, Canada, in 1983 and 1985, respectively. In 1996, he later obtained his PhD from the University of Hagen, Germany. From 1985 to 1987, he was a diagnostic engineer with the Control Data Canada where he designed the diagnostic software to detect the manufacture faults of the VLSI chips in the Cyber 430 machine. He later joined the Bell Northern Research Canada as a Member of Scientific staff. In 1990, he joined the City University of Hong Kong as a lecturer in the Department of Electronic Engineering. He is currently a Professor in the Department of Computer Science. His research interests are in Pattern Recognition, Evolutionary Algorithms and Video Coding.

    Ran Wang received her Bachelor's degree from School of Information Science & Technology, Beijing Forestry University, Beijing, China, in 2009. She is currently a Ph.D candidate in the Department of Computer Science, City University of Hong Kong. Her research interests focus on machine learning and its related applications.

    View full text