Elsevier

Information Sciences

Volume 220, 20 January 2013, Pages 46-63
Information Sciences

Fuzzy Passive–Aggressive classification: A robust and efficient algorithm for online classification problems

https://doi.org/10.1016/j.ins.2012.06.023Get rights and content

Abstract

Fuzzy weighting, which is designed to reduce the effects of outliers for batch classification problems, might generate unreasonable membership grades especially for the samples following an input outlier, when incorporated into online classification algorithms directly. In this paper, a generalized framework for online fuzzy weighting is presented, which incrementally calculates the membership of each incoming sample by taking into account the membership grades of previous samples in a pairwise manner. The advocated pairwise-distance based scheme can not only identify possible outliers, but also show good adaptation to the sequentially received samples in the online setting. We apply it to online Passive–Aggressive (PA) algorithm in a direct way. The resulting Fuzzy Passive–Aggressive (FPA) algorithm achieves comparable classification accuracy with benchmark incremental SVM, while still enjoying the time efficiency of simple PA, which is a Perceptron-like algorithm. Besides, FPA exhibits the best performance among PA family, which makes it a robust and efficient alternative to PA, in order to deal with unavoidable outliers in large-scale or high-dimensional real datasets. The study is supported by a series of experiments with IDA benchmark repository, as well as two real-world problems namely place recognition and radar emitter recognition.

Introduction

Nowadays machine learning techniques, such as K-NNs [16], Bayes classifiers [18], [47], decision trees [11], [36], and support vector machines (SVMs) [8], [13], [42], have been widely used for classification problems [19]. However, in practical applications, when the data arrive over long periods of time or the storage capacities are very limited (processing of data at once is impossible), these techniques fail to work well. Therefore, online learning is of increasing importance to deal with endless stream of received data such as stock market indexes, sensor data, and video streams [6]. For a system involved in online learning, it should allow to incorporate additional training data, when it is available, without re-training from scratch.

In the online setting, samples are received in a sequential manner. The model makes a prediction label for the current sample on each round. Then it may select to update the prediction mechanism or not, upon receiving the correct label. Perceptron [39] is a simple but efficient online learning algorithm, which has been extensively studied to enhance its generalization power. Crammer et al. [15] presented an online Passive–Aggressive (PA) algorithm, which is as fast as but more accurate than Perceptron. It updates the model to have a low loss on the new sample, as well as to ensure that the new model is close to the current one. Orabona et al. [34] proposed the Projectron algorithm, which is also based on Perceptron. But the number of online hypothesis in this method is bounded by projecting the samples onto the space spanned by the previous online hypothesis, instead of discarding them. In [20], ALMAp was presented, which approximates the maximal margin hyperplane for a set of linearly separable data. Additionally, SVMs have been modified for many incremental versions [9], [17], [24], which formulate an online optimization problem in general. The motivation of these works is to maintain the KKT conditions by varying the margin vector coefficients, in response to the perturbation imparted by the incremented new coefficient. To sum up, quite a few online machine learning algorithms have been developed to deal with the huge or stream data.

For real-world problems, the application of fuzzy set theory (FST) based model has also become attractive and visible, especially in the realms of automatic control and pattern recognition [29], [38]. The vague and imprecise expressions that are used by human beings to describe practical processes could be modeled gracefully by FST [35]. Linguistic rules based fuzzy classifiers (FCs) [4], [5], [37], [40] provide a comprehensive way to illustrate underlying concepts of complicated systems. In [1], an efficient method was discussed for extracting fuzzy rules directly from numerical input–output data for pattern classification. Mencar et al. [31] designed fuzzy rule-based classifiers from data with the constraint of semantic cointension. Moreover, fuzzy rules can be also extracted from the trained SVM models [10]. Generally speaking, these classifiers, supported by FST, are mainly structured on fuzzy-rules.

It is of great interests to realize the incremental application of a fuzzy-classification method, aiming to update fuzzy rules or generate a new one in an online way. In [2], a recursive approach using online clustering for adaptation of fuzzy-rule based model structure was discussed. This algorithm is instrumental for online identification of T–S models. A new approach was proposed in [7] for learning fuzzy rules in an incremental way. As new data arrive, new rules may be discovered while the existing ones may be updated or partially removed. Lughofer [28] presented two families of online evolving classifiers for image classification. The developed classifiers are initially trained on some pre-labeled training data and further updated according to newly recorded samples. As a whole, this research line has shown substantial potential for incremental adaptation of the model parameters and high understandability of fuzzy system in dynamically changing environments.

However, the research community should not neglect another direction in which FST is applied to machine learning directly, in order to deal with outliers and noise in real-world classification problems. Traditional machine learning approaches assumed that each sample has equal importance in the training procedure. However, in realistic applications, different samples should make different contributions to the decision boundary of involved classifiers [46]. In order to increase the robustness of the classifiers, fuzzy weighting is introduced naturally to score each sample. Different from above fuzzy-classification methods, which assign membership degrees for the outputs generated by classifiers together with the output class or extract fuzzy rules from the trained models, the methods in this category simply apply a membership value to define the importance of each training pattern and train the models on fuzzified input data [3], [30]. In [41], a robust support vector machine was proposed to solve the over-fitting problem. Lin and Wang [25] proposed a fuzzy SVM, which associates each sample with a membership value such that different samples have different effects on the learning of the separating hyperplane. Another similar effort can be found in [22]. Chen and Chen [14] introduced a membership function to kernel Perceptron for solving the linearly non-separable problems. Then in [21], a robust membership calculation scheme was presented, which performs well on noisy data. However, these methods of membership generation are only designed for specific data distributions or assume that training samples are received batchwise. Direct extension of such strategies to online learning might cause new problems because the knowledge of distribution information in the initial stages is not correct. More importantly, new decision boundary needs to be recalculated using entire current training set upon receiving a new sample, which is time consuming. To the best of our knowledge, few literatures have been devoted to this subject. Therefore a robust and efficient framework of incremental fuzzy weighting is imperative for online classification.

Our work follows the second direction. We consider the online membership generation only in the context of the fuzzified input space. It is worthwhile to highlight the main contributions of this paper here:

  • (1)

    A generalized framework of membership calculation is presented, which can be properly integrated into many online classification algorithms. Compared with the center-distance or margin-distance based schemes, the advocated pairwise-distance based membership generation scheme can not only identify possible outliers, but also show better adaptation to the sequentially received samples in the online setting.

  • (2)

    The proposed scheme for membership generation is incorporated into the online Passive–Aggressive algorithm in a direct way. The resulting Fuzzy Passive–Aggressive (FPA) approach achieves competitive classification performance with benchmark incremental SVM, especially when received training samples are insufficient. Besides, FPA exhibits the best robustness to outliers and label noise among PA family, which is demonstrated by IDA benchmark tests and two realistic recognition tasks.

  • (3)

    A detailed theoretical analysis on the computational complexity of related PA algorithms is provided. In particular, we show that FPA shares almost the same time efficiency as standard PA, which is a Perceptron-like algorithm. The slight increase in training time introduced by pairwise-distance based fuzzy weighting is acceptable in the online setting, compared to another two possible fuzzy extensions. The above merits make the proposed FPA a robust and efficient alternative to PA, in order to deal with unavoidable outliers in large-scale or high-dimensional real datasets.

This rest of paper is organized as follows. In section 2 we set up the problem and describe our generalized online weighting framework. Section 3 presents the fuzzy PA family, along with the time complexity analysis. A variety of experimental results are presented in Section 4. Finally, we conclude in Section 5 with a discussion of future work.

Section snippets

Generating membership grades for online learning

Most available membership-related methods (such as fuzzy SVMs [25] and fuzzy Perceptron [14]) are batch mode oriented and have shown good performance in machine learning. One may think it is straightforward to extend these methods directly to the online versions. However, our preliminary work indicates that the used membership calculation schemes, which sound reasonable for batch learning, are not appropriate for the online setting. To this end, we will examine the limitations of these schemes

Passive–Aggressive algorithms with fuzzy weighting

In this section, we first give a brief overview of the PA algorithm.2 Then we elaborate the Fuzzy PA family under the built framework of online membership generation, with particular attention to the proposed pairwise-distance based method, which is termed as FPA formally. It is also extended to multiclass scenarios via a special trick like in [15]. Finally, we end this section with a theoretical analysis of the computational complexity.

Experimental results

To test the performance of the proposed method, we first run it on several publicly available benchmark datasets from the IDA Repository, which are commonly used in the machine learning community. Then, we apply it to two more realistic scenarios, namely place recognition and radar emitter recognition.

We compared the fuzzy PA algorithms, including PA_fuzzy1 (center-distance based method), PA_fuzzy2 (margin-distance based method), and the proposed FPA (pairwise-distance based method) to further

Conclusion

This paper presents a novel strategy for online membership generation, which is particularly suitable for coping with unavoidable outliers in online classification problems. Based on pairwise-distance, the proposed scheme evaluates the importance of each input sample according to the available information at the last time step and the information of the current sample only. In such way, the influence of the current outlier on the following samples will be significantly reduced. Moreover, the

Acknowledgements

The authors are grateful to the reviewers and the Editor-in-Chief for their valuable comments and suggestions, which help to improve the technical quality of this paper greatly. Our special thanks go to one of the reviewers for the suggestion on the initial membership value, which is very helpful to increase the reliability of our weighting scheme.

References (47)

  • J.A. Sanz et al.

    Improving the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets and genetic amplitude tuning

    Information Sciences

    (2010)
  • J. Xiao et al.

    A dynamic classifier ensemble selection approach for noise data

    Information Sciences

    (2010)
  • M.L. Zhang et al.

    Feature selection for multi-label naive Bayes classification

    Information Sciences

    (2009)
  • S. Abe

    A method for fuzzy rules extraction directly from numerical data and its application to pattern classification

    IEEE Transaction on Fuzzy System

    (1995)
  • R. Batuwita et al.

    FSVM-CIL: fuzzy support vector machines for class imbalance learning

    IEEE Transaction on Fuzzy Systems

    (2010)
  • A. Bouchachia, Learning with incrementality, in: The 13th International Conference on Neural Information Processing,...
  • A. Bouchachia et al.

    Towards incremental fuzzy classifiers

    Soft Computing

    (2007)
  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • G. Cauwenberghs et al.

    Incremental and decremental support vector machine learning

    (2000)
  • C.C. Chang, C.J. Lin, LIBSVM: A Library for Support Vector Machines, 2001....
  • O. Chapelle et al.

    Support vector machines for histogram-based image classification

    IEEE Transaction on Neural Networks

    (1999)
  • J.H. Chen et al.

    Fuzzy Kernel perceptron

    IEEE Transaction on Neural Networks

    (2002)
  • K. Crammer et al.

    Online passive–aggressive algorithm

    Journal of Machine Learning Research

    (2006)
  • Cited by (50)

    • Evolving multi-label fuzzy classifier

      2022, Information Sciences
      Citation Excerpt :

      It is also embedded in the ALMMo approach for learning multi-model event systems[3]. A further remarkable approach was proposed in [43], where the linear weights of a classifier are updated based on a constrained optimization problem with the use of fuzzy membership degrees obtained through center-distance and margin-distance based assignments. Extensions to integrate type-2 fuzzy sets to integrate the spirit of type-2 fuzzy classifiers [20] and to model a second level of uncertainty in evolving fuzzy classifiers were proposed in [37], [42,41], the latter acting on three meta-cognitive aspects such as how-to-learn, when-to-learn and what-to-learn to maximize efficiency of the updates, the former exploiting the concept of interval-valued type-2 fuzzy sets in combination with Chebyshev polynomials in the rule consequents in order to locally model possible non-linearity.

    • Gaussian Mixture Descriptors Learner

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Therefore, they are suitable for large-scale problems, they are efficient in handling dynamic changes in data distribution, and in general, they require less training time and smaller memory than offline learning methods [4,10,11]. Online learning has been extensively studied in machine learning [12–18]. Among the existing online learning methods, one of the most widely used is the multilayer perceptron algorithm [19].

    View all citing articles on Scopus
    View full text