Clustering-based ensembles for one-class classification

doi:10.1016/j.ins.2013.12.019

Information Sciences

Volume 264, 20 April 2014, Pages 182-195

https://doi.org/10.1016/j.ins.2013.12.019 Get rights and content

Abstract

This paper presents a novel multi-class classifier based on weighted one-class support vector machines (OCSVM) operating in the clustered feature space. We show that splitting the target class into atomic subsets and using these as input for one-class classifiers leads to an efficient and stable recognition algorithm. The proposed system extends our previous works on combining OCSVM classifiers to solve both one-class and multi-class classification tasks. The main contribution of this work is the novel architecture for class decomposition and combination of classifier outputs. Based on the results of a large number of computational experiments we show that the proposed method outperforms both the OCSVM for a single class, as well as the multi-class SVM for multi-class classification problems. Other advantages are the highly parallel structure of the proposed solution, which facilitates parallel training and execution stages, and the relatively small number of control parameters.

Introduction

Well-known and reliable classifiers tend to fail when faced with new problems such as an atypical class distribution, non-stationary environments, or massive data. Therefore, new methods must be developed to deal with the challenges arising and improve the quality of real-life decision support systems.

One of these newly introduced methodologies is known as one-class classification (OCC) [31], which assumes that during the training stage only objects originating from a single class are available. These are called the target concept and are denoted by $ω_{T}$ . The purpose of OCC is to calculate a decision boundary that encloses all available data samples, thereby describing the concept [53]. During the execution phase, new objects, unseen during training, may appear. These may originate from one or more distributions and represent data outside the target concept. Such objects, denoted by $ω_{O}$ , are referred to as outliers.

For a single OCC classifier it may be difficult or even impossible to find a good model owing to limited training data, high feature space dimensionality, and/or the properties of the particular classifier. To avoid a too complex model and overfitting of the training target data, a simpler model with a lower number of features or one that has been trained with smaller chunks of data, can be created. Although the complexity of such a model is reduced, the quality thereof also declines significantly. However, it has been shown that a group of individual OCC models can help alleviate the aforementioned problems.

Here one may use an approach known as multiple classifier systems (MCSs), which is considered to be one of the fastest growing fields in machine learning [26]. MCSs are based on the idea of combining several classifiers into a compound recognition system that can exploit the strengths of individual predictors [60]. Each classifier may output a different decision boundary, and so have different competence areas over the analyzed dataset [7]. When combined, the collective decision accuracy can outperform any of the individual predictors. However, several important issues, such as selecting the individual classifiers, as well as choosing a fusion method to establish a group decision, must be considered when designing an MCS. Classifiers used to create the ensemble in an ideal situation should be highly accurate and complement each other (i.e., the ensemble should display high diversity). Adding classifiers that are not diverse with respect to those already in the pool will not improve the accuracy of the compound classifier, but will only increase the overall computational cost [5]. It is worth noting that combination rules, for example, majority voting, could even lead to a deterioration in performance of the ensemble of classifiers [36]. On the other hand, building an MCS with highly diverse but poor quality classifiers will result in a weak committee. Therefore, classifier selection is a critical step in the ensemble design process [15].

MCSs are an attractive yet still largely unexplored, alternative for OCC problems. Most of the works concentrate on practical applications of OCC ensembles. Much still needs to be done to gain insight into the theoretical background to this problem, as well as to draw conclusions on how to build efficient OCC ensembles regardless of the intended application [35].

We propose an approach based on the idea of data clustering in the feature space. OCC models are built based on each of the clusters. In this way we ensure that the pool of predictors is highly diverse and mutually complementary (owing to training on different inputs, i.e., clusters of training objects). This can be seen as an extension of the popular family of ensembles derived from the idea of clustering and selection proposed by Kuncheva [37]. So far, two other research teams have worked on this topic, proposing very simple hybrid methods for combining clustering and OCC [38], [45].

The contributions of this work are as follows:

•
We propose building an ensemble of one-class classifiers based on clustering of the target class. This ensures initial diversity among the classifiers in the pool (as they are based on different inputs) and the correct handling of possible issues embedded in the nature of data, such as a rare distribution or chunks of objects.
•
We propose an elastic and efficient framework for this task, which requires only the selection of several components, namely, the clustering algorithm, individual classifier model, and fusion method. These can easily be chosen by the user, as there are practically no limitations on their nature. All other parameters for the method are selected automatically.
•
We discuss the possibility of extending our one-class ensemble to an efficient tool for multi-class problems.
•
We carry out extensive computational tests on a diverse set of benchmarks that highlight the influence of component selection on the overall method quality and show that the proposed approach outperforms the standard OCC methods as well as a single multi-class support vector machine (SVM) in multi-class classification problems.

Our ensemble is easy to use in many practical applications where it is difficult or even impossible to obtain counter-examples (e.g., machine fault diagnosis), or where, owing to a complex data distribution, the class decomposition approach can lead to a significant improvement in recognition quality over the well known multi-class approaches (e.g., imbalanced classification).

This paper is organized as follows. In the next section the idea of OCC is presented. In Section 3 the architecture of the proposed compound recognition system is explained. The components that must be selected as input for the system are also presented. In Section 4 the experimental results are presented and discussed. The paper ends with the presentation of our conclusions in Section 5.

Section snippets

One-class classification

OCC aims to distinguish the target concept objects from possible outliers, and hence it is often referred to as learning in the absence of counter-examples. Although OCC is quite similar to binary classification, the primary difference lies in how the one-class classifier is trained. In standard dichotomy problems it is expected that objects from the other classes tend to come from one direction. Here the available class must be separated from all the possible outliers, which leads to a

Architectures for the proposed method

In this paper we propose a new architecture for creating ensembles of one-class classifiers based on the clustering of a feature space into smaller partitions. Additionally, we incorporate our new compound classifier into an architecture that allows both one-class and multi-class problems to be solved. Therefore, in this section we describe our algorithm from two different perspectives – a local perspective (the details of the introduced one-class clustering based ensemble) and a global

Experimental investigation

In this section, we present the results of thorough experimental investigation examining the behavior of the proposed one-class ensemble approach. The aim of the experiments was to assess the quality of the OCClustE components tested (clustering methods, classification algorithms, and fusers) and to compare the proposed method with known approaches for multi-class decomposition using one-class classifiers, i.e., where a single one-class classifier is assigned to each of the classes.

Our aim is

Conclusion and future work

This paper presented a method for creating a one-class classifier ensemble based on feature space partitioning. We proposed a two-level architecture for the design of such a classification system. The main advantage of the proposed method is that the combined classifiers trained on the basis of clusters allow us to exploit individual classifier strengths. As a result, these usually outperform traditional methods for one-class classifier combinations for multi-class classification problems

Acknowledgments

The work was supported by the Polish National Science Centre under Grant No. N519 576638 for the years 2010–2013, as well as by the Polish National Science Centre Grant No. DEC-2011/01/B/ST6/01994.

References (65)

J.C. Bezdek et al.
Fcm: the fuzzy c-means clustering algorithm
Comput. Geosci.
(1984)
Y. Bi
The impact of diversity on the accuracy of evidential classifier ensembles
Int. J. Approx. Reason.
(2012)
M. Bicego et al.
Soft clustering using weighted one-class support vector machines
Pattern Recogn.
(2009)
P.P.K. Chan et al.
Dynamic fusion method using localized generalization error model
Inform. Sci.
(2012)
Q. Dai
A competitive ensemble pruning approach based on cross-validation technique
Knowl.-Based Syst.
(2013)
M. Galar et al.
An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes
Pattern Recogn.
(2011)
N. Garcia-Pedrajas et al.
Supervised subspace projections for constructing ensembles of classifiers
Inform. Sci.
(2012)
Giorgio Giacinto et al.
Intrusion detection in computer networks by a modular ensemble of one-class classifiers
Inf. Fusion
(January 2008)
P. Juszczak et al.
Minimum spanning tree based one-class classifier
Neurocomputing
(2009)
M.W. Koch et al.
Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition
Neural Netw.
(1995)

L. Manevitz et al.

One-class document classification via neural networks

Neurocomputing

(2007)

A. Szczurek et al.

Vocs classification based on the committee of classifiers coupled with single sensor signals

Chemometr. Intell. Lab. Syst.

(2013)

Xizhao Wang et al.

Improving fuzzy c-means clustering based on feature-weight learning

Pattern Recogn. Lett.

(2004)

T. Wilk et al.

Soft computing methods applied to combination of one-class classifiers

Neurocomputing

(January 2012)

T. Windeatt et al.

Coding and decoding strategies for multi-class learning problems

Inform. Fusion

(2003)

M. Woźniak et al.

A survey of multiple classifier systems as hybrid systems

Inform. Fusion

(2014)

Ethem Alpaydin

Combined 5 × 2 cv f test for comparing supervised classification learning algorithms

Neural Comput.

(1999)

A. Bartkowiak et al.

Outliers analysis and one class classification approach for planetary gearbox diagnosis

J. Phys.: Conf. Ser.

(2011)

J. Bezdek

Pattern Recognition With Fuzzy Objective Function Algorithms

(1981)

Chih-Chung Chang et al.

LIBSVM: a library for support vector machines

ACM Trans. Intell. Syst. Technol.

(2011)

B. Chen et al.

One-cluster clustering based data description

Jisuanji Xuebao/Chinese J. Comput.

(2007)

Y. Chen et al.

One-class svm for learning in image retrieval

IEEE Int. Conf. Image Process.

(2001)

K.F. Cheung

Fuzzy one-mean algorithm: Formulation, convergence analysis, and applications

J. Intell. Fuzzy Syst.

(1997)

G. Cohen et al.

Novelty detection using one-class parzen density estimator: an application to surveillance of nosocomial infections

Stud. Health Technol. Inform.

(2008)

B. Cyganek, Image segmentation with a hybrid ensemble of one-class support vector machines, LNAI of Lecture Notes in...

B. Cyganek

One-class support vector ensembles for image segmentation and classification

J. Math. Imag. Vis.

(2012)

Thomas G. Dietterich et al.

Solving multiclass learning problems via error-correcting output codes

J. Artif. Int. Res.

(January 1995)

Evgenia Dimitriadou, Kurt Hornik, Friedrich Leisch, David Meyer, Andreas Weingessel, r-cran-e1071, 2011....

J.X. Dong, A. Krzyzak, C.Y. Suen, A practical SMO algorithm, in: Proc. Int. Conf. on Pattern Recognition, vol. 3,...

Gereon Frahling et al.

A fast k-means implementation using coresets

Int. J. Comput. Geometry Appl.

(2008)

A.B. Gardner et al.

One-class novelty detection for seizure analysis from intracranial eeg

J. Machine Learn. Res.

(2006)

Tin Kam Ho

The random subspace method for constructing decision forests

IEEE Trans. Pattern Anal. Mach. Intell.

(1998)

Cited by (126)

Time series clustering to improve one-class classifier performance[Formula presented]
2024, Expert Systems with Applications
The improvement of one-class classifiers’ performance through clustering of multivariate time series is considered in this paper. Datasets arising from real processes come from the available sensors and are affected by many factors, such as aging of the process, changes in the operation region, and equipment malfunction. Despite that, one expects that the classes represented by such diverse data can be unveiled via trained classifiers. This work hypothesizes that the overall performance can be improved by training sets of one-class classifiers with subsets of data clustered by similarity. The proposed method is applied to one class classifiers since they are trained only with the target class, which is clustered based on time series similarity using Dynamic Time Warping and k-means. The advantages of the techniques are illustrated through their application to a public dataset from the oil industry with instances characterizing eight classes of data represented by five time series. Seven classes are selected to train LSTM classifiers using the variables and instances clustered using time series clustering algorithms. The results show that the increase in the similarity of training data tends to improve the performance of the LSTM classifier, achieving an increase of 10% in the overall performance. In a specific case, where the clustering model raised the similarity by 84%, the classification performance improved by 21%.
Bounded exponential loss function based AdaBoost ensemble of OCSVMs
2024, Pattern Recognition
As a commonly used ensemble method, AdaBoost has drawn much consideration in the field of machine learning. However, AdaBoost is highly sensitive to outliers. The performance of AdaBoost may be greatly deteriorated when the training samples are polluted by outliers. For binary and multi-class classifications, there have emerged many approaches to improving the robustness of AdaBoost against outliers. Unfortunately, there are too few researches on enhancing the robustness of AdaBoost against outliers in the case of one-class classification. In this study, the exponential loss function of AdaBoost is replaced by a more robust one to improve the anti-outlier ability of the conventional AdaBoost based ensemble of one-class support vector machines (OCSVMs). Furthermore, based on the redesigned loss function, the update formulae for the weights of base classifiers and the probability distribution of training samples are reformulated towards the AdaBoost ensemble of OCSVMs. The empirical error upper bound is derived from the theoretical viewpoint. Experimental outcomes upon the artificial and benchmark data sets show that the presented ensemble approach is more robust against outliers than its related methods.
Privileged multi-view one-class support vector machine
2024, Neurocomputing
One-class support vector machine (OCSVM) is a typical one-class classification approach, which learns the classifier by using only the target samples. At present, most OCSVM works hypothesize that the samples have only one view, while multi-view OCSVM has not been taken into account. In this paper, a novel multi-view one-class support vector machine method with privileged information learning (MOCPIL) is put forward. MOCPIL embodies both the consensus principle and complementarity principle in multi-view learning. Privileged information is additional data that is available only in the training process, but not in the testing process. By introducing the idea of privileged information learning, MOCPIL implements the complementarity principle by treating one view as the training data and the other view as the privileged data. Moreover, MOCPIL implements the consensus principle by requiring that different views of the same object should give similar predicting outputs. The learning problem of MOCPIL is a quadratic programming (QP) problem, which is able to be solved by off-the-shelf QP solvers. To the best of our knowledge, this is the first study to tackle the multi-view learning problem based on OCSVM. The performance of MOCPIL is evaluated through extensive experiments. The experimental results have shown that MOCPIL explicitly outperforms the existing multi-view one-class classification methods.
Proximity-based density description with regularized reconstruction algorithm for anomaly detection
2024, Information Sciences
This study addresses unsupervised anomaly detection using one-class classification, which constructs a decision boundary to determine if a new instance belongs to the target class. Existing one-class classification methods often fail in real-world scenarios due to their sensitivity to noise and inability to handle complex structures. We propose a proximity-based density description with a regularized reconstruction algorithm to overcome these limitations. Our method defines density-descriptive coefficients to reconstruct initial density and derives optimal coefficients by minimizing reconstruction error subject to sparsity and smoothness constraints. The sparsity constraint reduces noise effects, while the smoothness constraint encourages a flexible decision boundary. We evaluate our algorithm on benchmark datasets and compare it to existing methods, demonstrating superior performance.
Predicting the risky encounters without distance knowledge between the ships via machine learning algorithms
2023, Expert Systems with Applications
As the maritime traffic is getting denser, the number of encounters is increasing. The aim of this study is to develop a prediction model to classify encounters as risky or non-risky when two ships encounter in a certain buffer zone. A novel methodology is proposed to integrate three-dimensional clustering in the algorithm training process. K-means clustering, and ensemble machine learning algorithms-based prediction framework is developed to overcome class imbalance. The methodology is tested in the Strait of Istanbul (SOI) and parameters are generated from a long-term AIS dataset. Framework is validated via cross validation techniques. Precision, Recall, Accuracy and ROC-AUC Score are used as measures to evaluate models. Benchmark models are generated, and the most advanced model successfully predicts each 4 out of 5 risky encounters without the knowledge of distance between two ships. Eliminating distance from decision factors provides an action period before risky encounters. Therefore, proposed framework can be a guide for autonomous vessels for safe navigation and maritime authorities to improve maritime safety.
Clustering ensemble-based novelty score for outlier detection
2023, Engineering Applications of Artificial Intelligence
Recently, One-class classification algorithms have been successfully used for outlier detection problems in several industrial fields. However, in case of that the target class has complex structures, single outlier detection model with one-class classifier often poorly performs because it cannot appropriately reflect intrinsic data structures. To address this limitation, we propose a clustering ensemble-based novelty score algorithm. The proposed algorithm calculates novelty score from the mixture of multiple clustering solutions generated by both random subspace and random-K ensemble approaches. Then, final ensemble novelty score is defined by summarizing multiple novelty scores obtained from individual clustering results. Because these multiple novelty scores are computed from many possible characteristics of target class information, the proposed ensemble novelty score can appropriately reflect the inherent structures of target class. Experiments were conducted on various benchmark datasets to compared with existing methods and investigate the properties of the proposed algorithm. The experimental results confirm that the proposed algorithm outperforms existing one-class classification methods in various cases.

View all citing articles on Scopus

View full text

Clustering-based ensembles for one-class classification

Abstract

Introduction

Section snippets

One-class classification

Architectures for the proposed method

Experimental investigation

Conclusion and future work

Acknowledgments

Comput. Geosci.

Int. J. Approx. Reason.

Pattern Recogn.

Inform. Sci.

Knowl.-Based Syst.

Pattern Recogn.

Inform. Sci.

Inf. Fusion

Neurocomputing

Neural Netw.

Neurocomputing

Chemometr. Intell. Lab. Syst.

Pattern Recogn. Lett.

Neurocomputing

Inform. Fusion

Inform. Fusion

Combined 5 × 2 cv f test for comparing supervised classification learning algorithms

Neural Comput.

Outliers analysis and one class classification approach for planetary gearbox diagnosis

J. Phys.: Conf. Ser.

Pattern Recognition With Fuzzy Objective Function Algorithms

LIBSVM: a library for support vector machines

ACM Trans. Intell. Syst. Technol.

One-cluster clustering based data description

Jisuanji Xuebao/Chinese J. Comput.

One-class svm for learning in image retrieval

IEEE Int. Conf. Image Process.

Fuzzy one-mean algorithm: Formulation, convergence analysis, and applications

J. Intell. Fuzzy Syst.

Novelty detection using one-class parzen density estimator: an application to surveillance of nosocomial infections

Stud. Health Technol. Inform.

One-class support vector ensembles for image segmentation and classification

J. Math. Imag. Vis.

Solving multiclass learning problems via error-correcting output codes

J. Artif. Int. Res.

A fast k-means implementation using coresets

Int. J. Comput. Geometry Appl.

One-class novelty detection for seizure analysis from intracranial eeg

J. Machine Learn. Res.

The random subspace method for constructing decision forests

IEEE Trans. Pattern Anal. Mach. Intell.