Fuzzy Passive–Aggressive classification: A robust and efficient algorithm for online classification problems

doi:10.1016/j.ins.2012.06.023

Information Sciences

Volume 220, 20 January 2013, Pages 46-63

https://doi.org/10.1016/j.ins.2012.06.023 Get rights and content

Abstract

Fuzzy weighting, which is designed to reduce the effects of outliers for batch classification problems, might generate unreasonable membership grades especially for the samples following an input outlier, when incorporated into online classification algorithms directly. In this paper, a generalized framework for online fuzzy weighting is presented, which incrementally calculates the membership of each incoming sample by taking into account the membership grades of previous samples in a pairwise manner. The advocated pairwise-distance based scheme can not only identify possible outliers, but also show good adaptation to the sequentially received samples in the online setting. We apply it to online Passive–Aggressive (PA) algorithm in a direct way. The resulting Fuzzy Passive–Aggressive (FPA) algorithm achieves comparable classification accuracy with benchmark incremental SVM, while still enjoying the time efficiency of simple PA, which is a Perceptron-like algorithm. Besides, FPA exhibits the best performance among PA family, which makes it a robust and efficient alternative to PA, in order to deal with unavoidable outliers in large-scale or high-dimensional real datasets. The study is supported by a series of experiments with IDA benchmark repository, as well as two real-world problems namely place recognition and radar emitter recognition.

Introduction

Nowadays machine learning techniques, such as K-NNs [16], Bayes classifiers [18], [47], decision trees [11], [36], and support vector machines (SVMs) [8], [13], [42], have been widely used for classification problems [19]. However, in practical applications, when the data arrive over long periods of time or the storage capacities are very limited (processing of data at once is impossible), these techniques fail to work well. Therefore, online learning is of increasing importance to deal with endless stream of received data such as stock market indexes, sensor data, and video streams [6]. For a system involved in online learning, it should allow to incorporate additional training data, when it is available, without re-training from scratch.

In the online setting, samples are received in a sequential manner. The model makes a prediction label for the current sample on each round. Then it may select to update the prediction mechanism or not, upon receiving the correct label. Perceptron [39] is a simple but efficient online learning algorithm, which has been extensively studied to enhance its generalization power. Crammer et al. [15] presented an online Passive–Aggressive (PA) algorithm, which is as fast as but more accurate than Perceptron. It updates the model to have a low loss on the new sample, as well as to ensure that the new model is close to the current one. Orabona et al. [34] proposed the Projectron algorithm, which is also based on Perceptron. But the number of online hypothesis in this method is bounded by projecting the samples onto the space spanned by the previous online hypothesis, instead of discarding them. In [20], ALMA_p was presented, which approximates the maximal margin hyperplane for a set of linearly separable data. Additionally, SVMs have been modified for many incremental versions [9], [17], [24], which formulate an online optimization problem in general. The motivation of these works is to maintain the KKT conditions by varying the margin vector coefficients, in response to the perturbation imparted by the incremented new coefficient. To sum up, quite a few online machine learning algorithms have been developed to deal with the huge or stream data.

For real-world problems, the application of fuzzy set theory (FST) based model has also become attractive and visible, especially in the realms of automatic control and pattern recognition [29], [38]. The vague and imprecise expressions that are used by human beings to describe practical processes could be modeled gracefully by FST [35]. Linguistic rules based fuzzy classifiers (FCs) [4], [5], [37], [40] provide a comprehensive way to illustrate underlying concepts of complicated systems. In [1], an efficient method was discussed for extracting fuzzy rules directly from numerical input–output data for pattern classification. Mencar et al. [31] designed fuzzy rule-based classifiers from data with the constraint of semantic cointension. Moreover, fuzzy rules can be also extracted from the trained SVM models [10]. Generally speaking, these classifiers, supported by FST, are mainly structured on fuzzy-rules.

It is of great interests to realize the incremental application of a fuzzy-classification method, aiming to update fuzzy rules or generate a new one in an online way. In [2], a recursive approach using online clustering for adaptation of fuzzy-rule based model structure was discussed. This algorithm is instrumental for online identification of T–S models. A new approach was proposed in [7] for learning fuzzy rules in an incremental way. As new data arrive, new rules may be discovered while the existing ones may be updated or partially removed. Lughofer [28] presented two families of online evolving classifiers for image classification. The developed classifiers are initially trained on some pre-labeled training data and further updated according to newly recorded samples. As a whole, this research line has shown substantial potential for incremental adaptation of the model parameters and high understandability of fuzzy system in dynamically changing environments.

However, the research community should not neglect another direction in which FST is applied to machine learning directly, in order to deal with outliers and noise in real-world classification problems. Traditional machine learning approaches assumed that each sample has equal importance in the training procedure. However, in realistic applications, different samples should make different contributions to the decision boundary of involved classifiers [46]. In order to increase the robustness of the classifiers, fuzzy weighting is introduced naturally to score each sample. Different from above fuzzy-classification methods, which assign membership degrees for the outputs generated by classifiers together with the output class or extract fuzzy rules from the trained models, the methods in this category simply apply a membership value to define the importance of each training pattern and train the models on fuzzified input data [3], [30]. In [41], a robust support vector machine was proposed to solve the over-fitting problem. Lin and Wang [25] proposed a fuzzy SVM, which associates each sample with a membership value such that different samples have different effects on the learning of the separating hyperplane. Another similar effort can be found in [22]. Chen and Chen [14] introduced a membership function to kernel Perceptron for solving the linearly non-separable problems. Then in [21], a robust membership calculation scheme was presented, which performs well on noisy data. However, these methods of membership generation are only designed for specific data distributions or assume that training samples are received batchwise. Direct extension of such strategies to online learning might cause new problems because the knowledge of distribution information in the initial stages is not correct. More importantly, new decision boundary needs to be recalculated using entire current training set upon receiving a new sample, which is time consuming. To the best of our knowledge, few literatures have been devoted to this subject. Therefore a robust and efficient framework of incremental fuzzy weighting is imperative for online classification.

Our work follows the second direction. We consider the online membership generation only in the context of the fuzzified input space. It is worthwhile to highlight the main contributions of this paper here:

(1)
A generalized framework of membership calculation is presented, which can be properly integrated into many online classification algorithms. Compared with the center-distance or margin-distance based schemes, the advocated pairwise-distance based membership generation scheme can not only identify possible outliers, but also show better adaptation to the sequentially received samples in the online setting.
(2)
The proposed scheme for membership generation is incorporated into the online Passive–Aggressive algorithm in a direct way. The resulting Fuzzy Passive–Aggressive (FPA) approach achieves competitive classification performance with benchmark incremental SVM, especially when received training samples are insufficient. Besides, FPA exhibits the best robustness to outliers and label noise among PA family, which is demonstrated by IDA benchmark tests and two realistic recognition tasks.
(3)
A detailed theoretical analysis on the computational complexity of related PA algorithms is provided. In particular, we show that FPA shares almost the same time efficiency as standard PA, which is a Perceptron-like algorithm. The slight increase in training time introduced by pairwise-distance based fuzzy weighting is acceptable in the online setting, compared to another two possible fuzzy extensions. The above merits make the proposed FPA a robust and efficient alternative to PA, in order to deal with unavoidable outliers in large-scale or high-dimensional real datasets.

This rest of paper is organized as follows. In section 2 we set up the problem and describe our generalized online weighting framework. Section 3 presents the fuzzy PA family, along with the time complexity analysis. A variety of experimental results are presented in Section 4. Finally, we conclude in Section 5 with a discussion of future work.

Section snippets

Generating membership grades for online learning

Most available membership-related methods (such as fuzzy SVMs [25] and fuzzy Perceptron [14]) are batch mode oriented and have shown good performance in machine learning. One may think it is straightforward to extend these methods directly to the online versions. However, our preliminary work indicates that the used membership calculation schemes, which sound reasonable for batch learning, are not appropriate for the online setting. To this end, we will examine the limitations of these schemes

Passive–Aggressive algorithms with fuzzy weighting

In this section, we first give a brief overview of the PA algorithm.² Then we elaborate the Fuzzy PA family under the built framework of online membership generation, with particular attention to the proposed pairwise-distance based method, which is termed as FPA formally. It is also extended to multiclass scenarios via a special trick like in [15]. Finally, we end this section with a theoretical analysis of the computational complexity.

Experimental results

To test the performance of the proposed method, we first run it on several publicly available benchmark datasets from the IDA Repository, which are commonly used in the machine learning community. Then, we apply it to two more realistic scenarios, namely place recognition and radar emitter recognition.

We compared the fuzzy PA algorithms, including PA_fuzzy1 (center-distance based method), PA_fuzzy2 (margin-distance based method), and the proposed FPA (pairwise-distance based method) to further

Conclusion

This paper presents a novel strategy for online membership generation, which is particularly suitable for coping with unavoidable outliers in online classification problems. Based on pairwise-distance, the proposed scheme evaluates the importance of each input sample according to the available information at the last time step and the information of the current sample only. In such way, the influence of the current outlier on the following samples will be significantly reduced. Moreover, the

Acknowledgements

The authors are grateful to the reviewers and the Editor-in-Chief for their valuable comments and suggestions, which help to improve the technical quality of this paper greatly. Our special thanks go to one of the reviewers for the suggestion on the initial membership value, which is very helpful to increase the reliability of our weighting scheme.

References (47)

P. Angelov
An approach for fuzzy rule-base adaptation using on-line clustering
International Journal of Approximate Reasoning
(2004)
F.J. Berlanga et al.
GP-COACH: genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems
Information Sciences
(2010)
V. Bombardier et al.
Fuzzy rule classifier: capability for generalization in wood color recognition
Engineering Applications of Artificial Intelligence
(2010)
J. Castro et al.
Extraction of fuzzy rules from support vector machines
Fuzzy Sets Systems
(2007)
B. Chandra et al.
Moving towards efficient decision tree construction
Information Sciences
(2009)
D. Fisch et al.
So near and yet so far: new insight into properties of some well-known classifier paradigms
Information Sciences
(2010)
E. Lughofer
On-line evolving image classifiers and their application to surface inspection
Image and Vision Computing
(2010)
C. Mencar et al.
Design of fuzzy rule-based classifiers with semantic cointension
Information Sciences
(2011)
Y. Ren et al.
A parsimony fuzzy rule-based classifier using axiomatic fuzzy set theory and support vector machines
Information Sciences
(2011)
B. Rezaee et al.
Data-driven fuzzy modeling for Takagi–Sugeno–Kang fuzzy system
Information Sciences
(2010)

J.A. Sanz et al.

Improving the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets and genetic amplitude tuning

Information Sciences

(2010)

J. Xiao et al.

A dynamic classifier ensemble selection approach for noise data

Information Sciences

(2010)

M.L. Zhang et al.

Feature selection for multi-label naive Bayes classification

Information Sciences

(2009)

S. Abe

A method for fuzzy rules extraction directly from numerical data and its application to pattern classification

IEEE Transaction on Fuzzy System

(1995)

R. Batuwita et al.

FSVM-CIL: fuzzy support vector machines for class imbalance learning

IEEE Transaction on Fuzzy Systems

(2010)

A. Bouchachia, Learning with incrementality, in: The 13th International Conference on Neural Information Processing,...

A. Bouchachia et al.

Towards incremental fuzzy classifiers

Soft Computing

(2007)

C.J.C. Burges

A tutorial on support vector machines for pattern recognition

Data Mining and Knowledge Discovery

(1998)

G. Cauwenberghs et al.

Incremental and decremental support vector machine learning

(2000)

C.C. Chang, C.J. Lin, LIBSVM: A Library for Support Vector Machines, 2001....

O. Chapelle et al.

Support vector machines for histogram-based image classification

IEEE Transaction on Neural Networks

(1999)

J.H. Chen et al.

Fuzzy Kernel perceptron

IEEE Transaction on Neural Networks

(2002)

K. Crammer et al.

Online passive–aggressive algorithm

Journal of Machine Learning Research

(2006)

Cited by (50)

Evolving multi-label fuzzy classifier with advanced robustness respecting human uncertainty
2022, Knowledge-Based Systems
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one class at the same time. We propose an evolving multi-label fuzzy classifier (EFC-ML-FWU) which is able to self-adapt and self-evolve its structure and consequent parameters in the form of multiple hyper-planes with new incoming multi-label samples in an incremental, single-pass manner and which especially addresses the intrinsic curse of dimensionality as well as human label uncertainty problems, often apparent in multi-label classification problems, to ensure the advanced robustness of the learned structure and parameters. The former is achieved by integrating feature weights into the learning process, specifically designed for online multi-label classification problems in an incremental manner, measuring the impact of features with respect to their discriminatory power. The features are integrated (i) into the rule evolution criterion, leading to a shrinkage of distances along unimportant dimensions, which reduces the likelihood of unnecessary rule evolution and thus decreases over-fitting due to the curse of dimensionality, (ii) into the first consequent learning part by a variable-regularized RFWLS approach realized through an incremental coordinate descent algorithm, and (iii) into the second consequent learning part employing correlation-based preservation learning by using weight-based thresholds (extending the classical Lipschitz constant-based threshold) within soft shrinkage operations to optimize a feature-based weighted $L_{1}$ -norm on the consequent parameters. Uncertainty in class labels is handled by an integration of sample weights, where lower weights indicate a higher uncertainty in the labels carried by a sample. This leads to (i) a weighted updating of the incremental feature weights, (ii) a weighted update of the rule antecedent space through a weighted incremental clustering process, and (iii) a specific weighted update of the consequent parameters exploring a single-label and a multi-label view of uncertainty. Our approach was evaluated on several data sets from the MULAN repository and showed significantly improved classification accuracy and average precision trend lines compared to alternative (evolving) one-versus-rest or classifier chaining concepts, and especially improved the native EFC-ML method without feature weights and uncertainty handling with performance gains up to 17% in the AUC of the accuracy trends. Furthermore, interesting insights into an improved robustness of the multi-label classifier (i) in the case of wrong labels due to low user experience levels and (ii) in the case of low label certainties but potentially correct labels were obtained.
Evolving multi-label fuzzy classifier
2022, Information Sciences
Citation Excerpt :
It is also embedded in the ALMMo approach for learning multi-model event systems[3]. A further remarkable approach was proposed in [43], where the linear weights of a classifier are updated based on a constrained optimization problem with the use of fuzzy membership degrees obtained through center-distance and margin-distance based assignments. Extensions to integrate type-2 fuzzy sets to integrate the spirit of type-2 fuzzy classifiers [20] and to model a second level of uncertainty in evolving fuzzy classifiers were proposed in [37], [42,41], the latter acting on three meta-cognitive aspects such as how-to-learn, when-to-learn and what-to-learn to maximize efficiency of the updates, the former exploiting the concept of interval-valued type-2 fuzzy sets in combination with Chebyshev polynomials in the rule consequents in order to locally model possible non-linearity.
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one (not necessarily non-overlapping) class at the same time. We propose an evolving multi-label fuzzy classifier (EFC-ML) which is able to self-adapt and self-evolve its structure with new incoming multi-label samples in an incremental, single-pass manner. It is based on a multi-output Takagi–Sugeno type architecture, where for each class a separate consequent hyper-plane is defined, which yields flexibility for partially approximating the respective classes in a binary $[0, 1]$ -regression context. The learning procedure embeds a locally weighted incremental correlation-based algorithm combined with (conventional) recursive fuzzily weighted least squares and Lasso-based regularization. Locality is important to avoid the out-masking effect of single class labels in one or more rules; the correlation-based part ensures that the interrelations between class labels, a specific well-known property in multi-label classification for improved performance, are preserved properly; the Lasso-based regularization reduces the curse of dimensionality effects in the case of a higher number of inputs. Antecedent learning is achieved by product-space clustering and conducted for all class labels together, which yields a single rule base (opposed to related techniques such as one-versus-rest or classifier chaining, achieving multiple different rule bases, one per class), allowing a compact knowledge view and thus enabling better interpretable insights. Furthermore, our approach comes with an online active learning (AL) strategy for updating the classifier on just a (smaller) number of selected samples, which in turn makes the approach applicable for scarcely labelled streams in applications, where the annotation effort is typically expensive. It is based on three essential concepts: novelty content in the antecedent space, uncertainty due to ambiguity in the consequent (output) space and parameter instability reduction, and these in combination with an upper-allowed selection budget (which could be predefined by a user). Our approach was evaluated on several data sets from the MULAN repository and showed significantly improved classification accuracy and average precision trend lines compared to (evolving) one-versus-rest or classifier chaining concepts. A significant result was that, due to the online AL method, a 90% reduction in the number of samples used for classifier updates had little effect on the accumulated accuracy trend lines compared to a full update in most data set cases.
Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams
2022, Journal of Computational Science
Learning from imbalanced data streams is one of the challenges associated with classification algorithms and learning classifiers. The goal of this paper is to propose and validate a new approach for learning from data streams, with reference to the problem of class-imbalanced data. A hybrid approach for changing the class distribution towards a more balanced dataset using over-sampling and instance selection techniques is discussed. The proposed approach is based on the integration of a weighted ensemble classification and a technique to deal with the problem of class imbalance, and is called Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI). Our approach assumes that classifiers are induced from incoming blocks of instances, called data chunks. These data chunks consist of incoming instances from different classes, and a balance between them is reached through our hybrid approach. These data chunks are then used to induce classifier ensembles. The proposed approach is validated experimentally using several selected benchmark datasets, and the results of computational experiments are presented and discussed. The results show that the proposed approach for eliminating class imbalance in data streams can help increase the performance of online learning algorithms. This is an extended version of a paper that was presented at the International Conference on Computational Science in 2021 (ICCS-2021) (Czarnowski, 2021). This version has been enriched with a deeper analysis of the results obtained from a modified variant of the originally proposed approach.
Gaussian Mixture Descriptors Learner
2020, Knowledge-Based Systems
Citation Excerpt :
Therefore, they are suitable for large-scale problems, they are efficient in handling dynamic changes in data distribution, and in general, they require less training time and smaller memory than offline learning methods [4,10,11]. Online learning has been extensively studied in machine learning [12–18]. Among the existing online learning methods, one of the most widely used is the multilayer perceptron algorithm [19].
In recent decades, various machine learning methods have been proposed to address classification problems. However, most of them do not support incremental (or online) learning and therefore are neither scalable nor robust to dynamic problems that change over time. In this study, a classification method was introduced based on the minimum description length principle, which offered a very good trade-off between model complexity and predictive power. The proposed method is lightweight, multiclass, and online. Moreover, despite its probabilistic nature, it can handle continuous features. Experiments conducted on real-world datasets with different characteristics demonstrated that the proposed method outperforms established online classification methods and is robust to overfitting, which is a desired characteristic for large, dynamic, and real-world classification problems.
Deep stacked stochastic configuration networks for lifelong learning of non-stationary data streams
2019, Information Sciences
The concept of SCN offers a fast framework with universal approximation guarantee for lifelong learning of non-stationary data streams. Its adaptive scope selection property enables for proper random generation of hidden unit parameters advancing conventional randomized approaches constrained with a fixed scope of random parameters. This paper proposes deep stacked stochastic configuration network (DSSCN) for continual learning of non-stationary data streams which contributes two major aspects: 1) DSSCN features a self-constructing methodology of deep stacked network structure where hidden unit and hidden layer are extracted automatically from continuously generated data streams; 2) the concept of SCN is developed to randomly assign inverse covariance matrix of multivariate Gaussian function in the hidden node addition step bypassing its computationally prohibitive tuning phase. Numerical evaluation and comparison with prominent data stream algorithms under two procedures: periodic hold-out and prequential test-then-train processes demonstrate the advantage of proposed methodology.
On learning guarantees to unsupervised concept drift detection on data streams
2019, Expert Systems with Applications
Motivated by the Statistical Learning Theory (SLT), which provides a theoretical framework to ensure when supervised learning algorithms generalize input data, this manuscript relies on the Algorithmic Stability framework to prove learning bounds for the unsupervised concept drift detection on data streams. Based on such proof, we also designed the Plover algorithm to detect drifts using different measure functions, such as Statistical Moments and the Power Spectrum. In this way, the criterion for issuing data changes can also be adapted to better address the target task. From synthetic and real-world scenarios, we observed that each data stream may require a different measure function to identify concept drifts, according to the underlying characteristics of the corresponding application domain. In addition, we discussed about the differences of our approach against others from literature, and showed illustrative results confirming the usefulness of our proposal.

View all citing articles on Scopus

View full text

Fuzzy Passive–Aggressive classification: A robust and efficient algorithm for online classification problems

Abstract

Introduction

Section snippets

Generating membership grades for online learning

Passive–Aggressive algorithms with fuzzy weighting

Experimental results

Conclusion

Acknowledgements

International Journal of Approximate Reasoning

Information Sciences

Engineering Applications of Artificial Intelligence

Fuzzy Sets Systems

Information Sciences

Information Sciences

Image and Vision Computing

Information Sciences

Information Sciences

Information Sciences

Information Sciences

Information Sciences

Information Sciences

A method for fuzzy rules extraction directly from numerical data and its application to pattern classification

IEEE Transaction on Fuzzy System

FSVM-CIL: fuzzy support vector machines for class imbalance learning

IEEE Transaction on Fuzzy Systems

Towards incremental fuzzy classifiers

Soft Computing

A tutorial on support vector machines for pattern recognition

Data Mining and Knowledge Discovery

Incremental and decremental support vector machine learning

Support vector machines for histogram-based image classification

IEEE Transaction on Neural Networks

Fuzzy Kernel perceptron

IEEE Transaction on Neural Networks

Online passive–aggressive algorithm

Journal of Machine Learning Research