Keywords

1 Introduction

One-Class Classification is considered as one of the most challenging areas of machine learning. It has gained a lot of attention and it can be found in many practical applications such as medical analysis [1], face recognition [2], authorship verification [3].

To solve one-class classification problems, several methods have been proposed and different concrete models have been constructed. However, the key limitation of the existing categories of one-class classification methods is that none of them consider the full scale of information available. In boundary-based methods, like the One-Class Support Vector Machine (OSVM) [4] or Support Vector Data Description (SVDD) [5], only boundary data points are considered to build the model, and the overall class is not completely considered.

Besides, unlike multi-class classification problems, the low variance directions of the target class distribution are crucial for one-class classification. In [6], it has been shown that projecting the data in the high variance directions (like PCA) will result in higher error (bias), while retaining the low variance directions will lower the total error. As a solution, Naimul Mefraz Khan et al. proposed in [7] to put more emphasis on the low variance directions while keeping the basic formulation of OSVM untouched, so that we still have a convex optimization problem with a unique global solution, that can be reached easily using numerical methods. Covariance Guided One-Class Support Vector Machine (COSVM) is a powerful kernel method for one-class classification, inspired from the Support Vector Machine (SVM), where the covariance matrix is incorporated into the dual optimization problem of OSVM. The covariance matrix is estimated in the kernel space. Concerning its classification performance, success of COSVM has been shown when compared to SVDD and OSVM. However, there are still some difficulties associated with COSVM application in real case problems, where data are sequentially obtained and learning has to be done from the first data. Besides, COSVM requires large memory and enormous amount of training time, especially for large dataset.

Implementations for the existing One-Class Classification methods assume that all the data are provided in advance, and learning process is carried out in the same step. Hence, these techniques are referred to as batch learning. Because of this limitation, batch techniques show a serious performance degradation in real-word applications when data are not available from the very beginning. For such situation, a new learning strategy is required. Opposed to batch learning, incremental learning is more effective when dealing with non-stationary or very large amount of data. Thus, it finds its application in a great variety of situations such as visual tracking [8], software project estimation [9], brain computer interfacing [10].

It has been defined in [11] with 4 criteria:

  1. 1.

    it should be able to learn additional information from new data

  2. 2.

    it should not require access to the original data

  3. 3.

    it should preserve previously acquired knowledge and use it to update an existing classifier.

  4. 4.

    it should be able accommodate new outliers and target samples.

Several learning algorithms have been studied and modified to incremental procedures, able to learn through time. Cauwenberghs and Poggio [12] proposed an online learning algorithm of Support Vector Machine (SVM). Their algorithm changes the coefficient of original Support Vectors (SV), and retains the Karuch-Kuhn-Tucker (KKT) conditions on all previously training data as a new sample acquired. Their approach have been extended by Laskov et al. [13] to OSVM. However, the performance evaluation was only based on multi-class SVM. From their side, Manuel Davy et al. introduced in [14] an online SVM for abnormal events detection. They proposed a strategy to perform abnormality detection over various signals by extracting relevant features from the considered signal and detecting novelty, using an incremental procedure. Incremental SVDD proposed in [15] is also based on the control of the variation of the KKT conditions as new samples are added. An other approach to improve the classification performance is introduced in [16]. Incremental Weighted One-Class Support Vector Machine (WOCSVM) is an extension of incremental OSVM. The proposed algorithm aims to assign weights to each object of the training set, then it controls its influence on the shape of the decision boundary.

All the proposed Incremental One-Class SVM inherit the problem of classic SVM method which uses only boundary points to build a model, regardless of the spread of the remaining data. Also, none of them emphasizes the low variance direction, which results in performance degradation. Therefore, in this paper we try to solve mainly this problem by using an incremental COSVM (iCOSVM) approach. In fact, iCOSVM has the advantage of incrementally emphasizing the low variance direction to improve classification performance, which is not the case for classical incremental one-class models. Our preposition aims to take advantages from the accuracy of COSVM procedure and we prove that it is a good candidate for learning in non-stationary environments.

The rest of the paper is organized as follows. Section 2 reviews the canonical COSVM method since it is the basis of our proposed method. In Sect. 3, we present in details the mathematical derivation of iCOSVM and we describe the incremental algorithm. Section 4 presents our experimental studies and comparison with canonical COSVM and other incremental one-class classifiers. Finally, Sect. 5 contains some concluding remarks and perspectives.

2 The COSVM Method

Mathematically, OSVM tries to find the hyperplane that separates the training data from the origin with maximum margin. It can be modeled by the following dual problem, formulated using Lagrange multipliers.

$$\begin{aligned}&\min _{\varvec{\alpha }}\frac{1}{2}\varvec{\alpha }^T\mathbf {K}\varvec{\alpha }+b\left( 1-\sum _{i=1}^{N}\alpha _{i}\right) .\\ \nonumber&s.t.\;\;\; 0\le \alpha _i\le \frac{1}{\nu N}=C, \;\;\sum _{i=1}^{N}\alpha _{i}=1. \end{aligned}$$
(1)

Here, \(\nu \in (0,1]\) is a key parameter that controls the fraction of outliers and that of support vectors, C is the penalty weight punishing the misclassified training examples, \(\mathbf {K}\left( x_{i},x_{j}\right) =\left\langle \varPhi \left( x_{i}\right) ,\varPhi \left( x_{j}\right) \right\rangle ,\forall i,j\in \left\{ 1,2,\ldots ,N\right\} \) is the kernel matrix for the training data, and \(\varvec{\alpha }\) are the Lagrange multipliers to be determined.

The covariance matrix is then plugged in the dual problem and a parameter \(\eta \in \left[ 0,1\right] \) is introduced to control the contribution of the kernel matrix \(\mathbf {K}\) and the covariance matrix to the objective function. The COSVM optimization problem can be written as follows:

$$\begin{aligned}&\min _{\varvec{\alpha }}W\left( \alpha ,b\right) =\frac{1}{2}\varvec{\alpha }^T\left( \eta \mathbf {K}+(1-\eta )\varDelta \right) \varvec{\alpha }-b\left( 1-\sum _{i=1}^{N}\alpha _i\right) .\\ \nonumber&s.t.\;\;\; 0\le \alpha _i\le C, \;\;\sum _{i=1}^{N}\alpha _i=1, \end{aligned}$$
(2)

where \(\varDelta =\mathbf {K}\left( I-1_{N}\right) \mathbf {K}^{T}\). The control parameter \(\eta \) can take values from 0 to 1.

3 The Incremental COSVM Method

The key of our method is to construct a solution recursively, by adding one point at a time [12], and retain the Karush-Kuhn-Tucker Conditions on all previously acquired data.

Fig. 1.
figure 1

Subset \(\mathcal {S}\). \(g_{i}=0\) and \(0<\alpha _{i}<C\)

Fig. 2.
figure 2

Subset \(\mathcal {E}\). \(g_{i}<0\) and \(\alpha _{i}=C\)

Fig. 3.
figure 3

Subset \(\mathcal {O}\). \(g_{i}>0\) and \(\alpha _{i}=0\)

3.1 Karush-Kuhn-Tucker Conditions

Both the kernel matrix \(\mathbf K \) and the covariance matrix \(\varDelta \) are positive definite [17]. Therefore, the proposed method still results in a convex optimization problem. Thus, the solution to this optimization problem will have one global optimum solution and can be solved efficiently using a mathematical method. Karush-Kuhn-Tucker (KKT) conditions [18] are among the most important theoretical optimization methods.

First, let’s note

$$\varGamma = \left( \eta \mathbf {K}+(1-\eta )\varDelta \right) . $$

The slopes \(\textit{g}_{i}\) of the cost function W in equation (2) are expressed using the KKT conditions as:

$$\begin{aligned} g_{i}=\frac{\partial W}{\partial \alpha }=\sum _{j}\varGamma _{i,j}\alpha _{j}-b{\left\{ \begin{array}{ll} \ge 0; &{} \alpha _{i}=0\\ =0; &{} 0<\alpha _{i}<C\\ \le 0; &{} \alpha _{i}=C \end{array}\right. } \end{aligned}$$
(3)
$$\begin{aligned} \frac{\partial W}{\partial b}=1-\sum \alpha =0. \end{aligned}$$
(4)

According to the KKT conditions above, the target training data can be divided into three categories, shown in Figs. 1, 2 and 3:

  1. 1.

    Margin or unbounded Support Vectors are training points \(\mathcal {S}=\left\{ i/0<\alpha _{i}<C\right\} \),

  2. 2.

    Error or bounded Support Vectors \(\mathcal {E}=\left\{ i/\alpha _{i}=C\right\} \),

  3. 3.

    Non Support Vectors \(\mathcal {O}=\left\{ i/\alpha _{i}=0\right\} \).

The KKT conditions have to be maintained for all trained data before a new data \(x_{c}\) is added and preserved after the new data is trained. Hence, the change of Lagrange multipliers \(\varDelta \alpha \) is determined to hold the KKT.

3.2 Adiabatic Increments

To maintain the equilibrium of the KKT conditions expressed in Eqs. (3) and (4), we express them differentially:

$$\begin{aligned} \varDelta g_{i}=\varGamma _{i,c}\alpha _{c}+\sum _{j}\varGamma _{i,j}\alpha _{j}-\varDelta b, \end{aligned}$$
(5)
$$\begin{aligned} \varDelta \alpha _{c} + \sum _{j\in \mathcal {S}}\alpha _{j} = 0. \end{aligned}$$
(6)

The two equations above can be written as:

$$\begin{aligned} \left[ \begin{array}{c} \varDelta g_{c} \\ \varDelta g_{s} \\ \varDelta g_{r} \\ 0 \end{array} \right] = \begin{bmatrix} 1&\varGamma _{c,s} \\ 1&\varGamma _{s,s} \\ 1&\varGamma _{r,s} \\ 0&1 \end{bmatrix} \left[ \begin{array}{c} -\varDelta b \\ \varDelta \alpha _{s} \end{array} \right] +\varDelta \alpha _{c} \left[ \begin{array}{c} \varGamma _{c,c} \\ \varGamma _{s,c} \\ \varGamma _{r,c} \\ 1\end{array} \right] . \end{aligned}$$
(7)

Since \(\varDelta \;g_{i}=0\) when \(i\in \mathcal {S}\)(it remains zero), lines 2 and 4 of the system (7) can be re-written as:

$$\begin{aligned} \left[ \begin{array}{c} 0\\ 0 \end{array} \right] = \begin{bmatrix} 0&\mathbf 1 \\ \mathbf 1&\varGamma _{s,s} \end{bmatrix} \left[ \begin{array}{c} -\varDelta b \\ \varDelta \alpha _{s} \end{array} \right] +\varDelta \alpha _{c} \left[ \begin{array}{c} \mathbf 1 \\ \varGamma _{s,c} \end{array} \right] . \end{aligned}$$
(8)

Thus, we can express the dependence of \(\varDelta \alpha _{i}\), \(i\in \mathcal {S}\) and \(\varDelta g_{i}=0\), \(i\notin \mathcal {S}\) on \(\varDelta \alpha _{c}\) as the following:

$$\begin{aligned} \left[ \begin{array}{c} -\varDelta b \\ \varDelta \alpha _{s} \end{array} \right] = -\mathbf R \begin{bmatrix} 1 \\ \varGamma _{s,c} \end{bmatrix}\varDelta \alpha _{c}, \end{aligned}$$
(9)

with

$$\mathbf R = \begin{bmatrix} 0&\mathbf 1 \\ \mathbf 1&\varGamma _{s,s} \end{bmatrix}^{-1}. $$

Here, \(\varGamma _{s,s}\) is the kernel matrix whose entries are support vectors, and \(\varGamma _{s,c}\) is a vector of kernels between the margin support vectors and the new candidate vector \(x_{c}\).

The Eq. (9) gives the following:

$$\left[ \begin{array}{c} -\varDelta b \\ \varDelta \alpha _{s} \end{array} \right] =\beta \varDelta \alpha _{c}, $$

where

$$\begin{aligned} \beta =-\mathbf R \begin{bmatrix} 1 \\ \varGamma _{s,c}\end{bmatrix}. \end{aligned}$$
(10)

In equilibrium,

$$\begin{aligned} {\left\{ \begin{array}{ll} \varDelta b = -\beta _{b} \varDelta \alpha _{c},\\ \varDelta \alpha _{j}=\beta _{j}\varDelta \alpha _{c}, j\in \mathcal {S} \end{array}\right. } \end{aligned}$$
(11)

and \(\beta _{j}=0\) for all j outside the subset \(\mathcal {S}\).

Substituting Eq. (11) into lines 1 and 3 of the system (7) leads to the desired relation between \(\varDelta g_{i}\) and \(\varDelta \alpha _{c}\):

$$\begin{aligned} \varDelta {g_{i}}= \gamma _{i}\ \varDelta \alpha _{c}, i\in \left\{ 1...n\right\} \cup \{c\} \end{aligned}$$
(12)

where we define

$$\begin{aligned} {\left\{ \begin{array}{ll} \gamma _{i}=\varGamma _{i,c}+ \sum _{j\in \mathcal {S}}\varGamma _{i,j}\beta _{j},\ i\notin \mathcal {S}\\ \gamma _{i}=0,\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ i\in \mathcal {S} \end{array}\right. } \end{aligned}$$
(13)

3.3 Vectors Entering and Leaving a Subset

During the incremental procedure, a new example \(x_{c}\) can be added to the previous training set, and depending on the value of the calculated parameters \(g_{c}\) and \(\alpha _{c}\), the \(x_{c}\) is recognized as a support vector, an error vector or a data vector. If \(x_{c}\) is classified as a support vector, the set \(\mathcal {S}\), as well as the classification boundaries and margins should be updated. Since the Margin Support Vectors are our first concern in a classification process, it is worth to focus on the changes in the subset \(\mathcal {S}\). Besides, we can see from the Eqs. (10), (11), (12) and (13) of the previous section, that only R matrix needs to be computed to obtain all updated parameters. Let us consider a vector \(x_{k}\) entering to the subset \(\mathcal {S}\). Using the Woodbury formula [19], R expands as:

$$\begin{aligned} \widetilde{\mathbf{R }}=\begin{bmatrix} R&0\\0&0 \end{bmatrix}+\frac{1}{\gamma _{c}}\left[ \begin{array}{c} \beta \\ 1 \end{array} \right] \left[ \begin{array}{c} \beta \\ 1 \end{array} \right] ^{T}. \end{aligned}$$
(14)

When \(x_{k}\) leaves \(\mathcal {S}\), and using the same formula, R contracts as:

$$\begin{aligned} \widetilde{\mathbf{R }}= \mathbf R _{\overline{k},\overline{k}}- \mathbf R _{k,k}{} \mathbf R _{\overline{k},k}{} \mathbf R _{k,\overline{k}}. \end{aligned}$$
(15)
Fig. 4.
figure 4

Case 1: schematic depiction of the decision hyperplane for iCOSVM when the optimal control parameter value is \(\eta =1\). The optimal linear projection is along the direction of high variance.

Fig. 5.
figure 5

Case 2: Schematic depiction of the decision hyperplane for iCOSVM when the optimal control parameter value is \(\eta =0\). The optimal linear projection is along the direction of low variance.

Fig. 6.
figure 6

General Case: schematic depiction of the decision hyperplane for iCOSVM when the optimal parameter value lies in between 0 and 1 (\(0<\eta <1\)). The linear projection direction for iOSVM (depicted by dotted arrows) results in higher overlap between the example target and hypothetical outlier data (circled by dotted boundary) than the iCOSVM projection direction (depicted by solid arrows and the overlap circled by solid boundary).

3.4 The Impact of the Tradeoff Parameter \(\eta \)

The contribution of our kernel matrix \(\mathbf K \) and the covariance matrix \(\varDelta \) is controlled using the parameter \(\eta \). Figures 4, 5 and 6 present three different cases showing the impact of the covariance matrix on the direction of the separating hyperplane in the kernel space optimality. In Fig. 4, the optimal decision hyperplane is on the same direction as the high variance direction. Hence, the low variance direction will not improve the separating direction. That is why the value of \(\eta \) should be set to 1 in order to eliminate the covariance matrix term. On the other hand, in Fig. 5, the directions of the optimal decision hyperplane and the low variance are parallel. Therefore the incremental OSVM (iOSVM) term (kernel matrix) is ignored by setting \(\eta \) to 0. However, in real world cases, it is very rare that the optimal decision hyperplane has the same direction as the low or high variance. For this reason, the value of \(\eta \) needs to be tuned so that we have less overlap between the linear projections of the target data and the outlier data. As Fig. 6 shows, by using optimal \(\eta \) value, iCOSVM can reduce the huge overlap caused by iOSVM projection.

3.5 The Incremental Algorithm

Our implementation of incremental Covariance-guided One-Class SVM is presented as pseudo-code in Algorithm 1.

figure a

If the equilibrium is not reached, parameters are sequentially moved until the equilibrium is met. We aim to determine the largest possible increment \(\triangle \alpha _{c}\) so that the decomposition of the set remains intact, while accounting for the movement of some data from set to another during the update process. This is the idea of adiabatic increments [12].

4 Experimental Results

In this section, we present detailed experimental analysis and results for our proposed method, performed on artificially synthesized dataset and real world datasets. We have evaluated the performance of our method with two different experiment sets. In the first one, we compared the accuracy and time results with non-incremental COSVM, to tease out the advantage of our incremental model over batch learning model. In the second experiment set, we compare the iCOSVM performance against the performance of contemporary incremental one-class classifiers, to show the advantage of incrementally projecting data in low variance directions. For the implementation, we used Tax’s data description toolbox [20] in Matlab. First, we provided an analysis of the effect of tuning the key control parameter \(\eta \). This analysis will lead us to decide how to optimize the value of \(\eta \) for a particular dataset.

4.1 Optimising the Value of \(\eta \)

Cross validation can not be used to optimize the value of \(\eta \). Therefore, a stopping criterion is considered to find the optimum value. We use a pre-defined lowest fraction of outliers allowed \(\left( f_{OL}\right) \) as a stopping criterion. For new datasets, we set \(\eta \) to 1, and we decrease its value, while observing the fraction of outliers. When it hits \(\left( f_{OL}\right) \), we stop and use the current value of \(\eta \) for the considered dataset. We have to mention that there is no conflict between \(\left( f_{OL}\right) \) and the OSVM parameter \(\nu \), and they can be set independently to fit the purpose of the dataset to be trained on. There is no strict conditions on how to choose the value of \(\nu \), it can be set to any value from 0 to 1 [21]. For our additional parameter \(\left( f_{OL}\right) \), it is set to any value between 0 to \(\nu \).

Table 1. Description of datasets.
Fig. 7.
figure 7

Artificial datasets used for comparison. The two shapes denote two different classes generated from a pre-defined distribution. Each class was used as target and outlier in turns.

4.2 Datasets Used

We have used both artificially generated datasets and real world datasets in our experiments to tease out the effectiveness of our proposed method in different scenarios. For the experiments on artificially generated data, we have created a number of 2D two class data drawn from two different set of distributions: (1) Gaussian distribution with different covariance matrices. (2) Banana-shaped distribution with different variances. For each distribution, two datasets were generated, the first one with low overlap, and the second with high overlap. Each class of each dataset was used as a target class and outliers in turns, such that we evaluate the performance on 8 datasets (2 distributions \(\times \) 2 classes \(\times \) 2 overlaps). Figure 7 presents the plots of the generated datasets.

For the real world case, we focused on medical datasets as this domain is one of the key fields where one class classification is applied [1]. A detailed description of the used datasets can be found in Table 1. These datasets are collected from the UCI machine learning repository [22] and picked carefully, so that we have a variety of sizes and dimensions, and we can, then, test the robustness of our iCOSVM. As these datasets are originally two-class or multi-class, we used one of them as a target class and the other ones are kept outliers.

4.3 Experimental Protocol

To make sure that our results are not coincidental or overoptimistic, we used a cross-validation process [23]. The considered dataset was randomly split into 10 subsets of equal size. To build a model, one of the 10 subsets was removed, and the rest was used as the training data. The previously removed subset was added to the outliers and this whole set was used for testing. Finally the 10 accuracy estimates are averaged to provide the accuracy over all the models of a dataset. This guarantees that the achieved results were not a coincidence. Moreover, to measure the performance of one class classifier, the Receiver Operating Characteristic (ROC) curves [24] are usually used. The ROC curve presents a powerful measurement of the performance of studied classifier. It does not depend on the number of training or testing data points neither on the number of outliers, it only depends on rates of correct and incorrect target detection. To evaluate the methods, we have also used the Area Under Curve (AUC) [25] produced by the ROC curves, and we presented them in the results.

4.4 Classifiers

The iCOSVM was evaluated withe the comparison of its performance against the following classifiers’ performance:

  • COSVM: Since our incremental approach is built upon COSVM, this classifier has been described in details in Sect. 2.

  • iOSVM: This method tries to find, recursively, the maximum margin hyperplane that separates targets from outliers.

  • iSVDD: This method gives the sphere boundary description of the target data points with minimum volume.

The incremental classifiers, iOSVM, iSVDD and iCOSVM were implemented with the help of DDtools [20]. For the implementation of COSVM, the SVM-KM toolbox was used [26]. The radial basis kernel was used for kernelization. This kernel is calculated as \(\mathcal {K}(x_i, x_j)=e^{-{\Vert x_i-x_j\Vert }^2/\sigma }\). It is proved to be robust and flexible [27]. Here, \(\sigma \) represents the positive “width” parameter. For \(\eta \) value optimization, the value of \(\sigma \) was set to 1. But, when comparing with other methods \(\sigma \) is optimized first. The parameter \(\nu \) for COSVM, iOSVM and iCOSVM, also called fraction of rejection in the case of iSVDD was set to 0.2.

While optimizing \(\eta \), the lowest threshold for the fraction of outliers (\(f_{OL}\)) was set to 0.1 (see Sect. 4.1). However, it is too difficult, and even not possible to define optimal values for the parameters \(f_{OL}\) and \(\nu \) in real cases, where data points are unknown in the beginning of the classification process. Therefore, we have set both of the two parameters to 0.2.

4.5 Results and Discussion

To test the effectiveness of our proposed algorithm, we started by comparing iCOSVM with canonical COSVM on artificially generated datasets.

As we can see in Table 2, iCOSVM provides better results in terms of AUC values, on all datasets, by averaging over 10 different models. Figure 8 shows the average training time per model for artificial datasets of different sizes. The training speed of our algorithm is faster than the COSVM, mainly on large data sets, and presents insignificant variation as the size of the dataset increases. It has been shown in a number of recent studies [28] that incremental learning algorithms outperform batch learning algorithms in both speed and accuracy, because they provide cleaner solution.

Table 2. Average AUC of COSVM and iCOSVM for the 8 artificial datasets. Each dataset has 1000 data points(best method in bold).
Fig. 8.
figure 8

Log of training times (per model) in seconds for COSVM and iCOSVM for the experiments on the artificial datasets of different sizes.

In fact, the complexity for solving the convex optimization problem of COSVM is \(O(N^3)\), where N is the number of training data points. Whereas, a key to efficiency of the iCOSVM algorithm lies in identifying performance bottlenecks associated with inverting matrices to solve the convex optimization problem. These operations were eliminated thanks to the introduction of the Woodbery formula for the re-computation of the gradient, \(\beta \) and \(\gamma \). This involves matrix-vector multiplications and recursive updates of the matrix \(\mathbf R \), whose dimension is equal to the support vectors number \(N_s\). The running time needed for an update of the matrix \(\mathbf R \) is quadratic in the number of support vectors, which is much better than explicit inversion. Thus, in incremental learning, the complexity is \(O(N_s^2)\), where \(N_s \le N\).

Table 3. Average AUC of each method for the 12 artificial datasets (best method in bold, second best emphasized).
Table 4. Average AUC of each method for the 12 real-world datasets (best method in bold, second best emphasized).

Tables 3 and 4 contain the average AUC for the incremental classifiers on the artificial and real datasets, respectively. As we can see, iCOSVM provides better results on all datasets. Specially in case of the biomedical and chromosome datasets, iCOSVM performs significantly better when compared to other methods. It is not surprising that iSVDD gives almost the worst accuracy values, as SVM and its derivatives are constructed to give the better separation [29].

We notice that iCOSVM outperforms the other classifiers as \(\eta \) values are in the neighborhood of 0.7, which puts more emphasize on the kernel matrix and fine-tune the contribution of the covariance matrix.

Since the process of computing the covariance matrix is done as a pre-processing and re-used during all training phase, in terms of training complexity, iCOSVM does not have additional overhead on top of the original iOSVM. Table 5 shows the average training times per model for both the artificial and the real-world datasets. As we expect, iCOSVM performs almost as fast as iOSVM, while providing better classification accuracy.

Table 5. Average training times (per model) in seconds for iOSVM and iCOSVM for the experiments on the artificial and real-world datasets. Average training times (per model) in seconds for iOSVM and iCOSVM for the experiments on the artificial and real-world datasets.

Also, we present some individual graphical results for the dataset models by plotting the actual ROC curves for a real world dataset. Figure 9 shows the ROC curves of the three incremental classifiers for four models of the chromosome dataset. The rule-of-thumb to judge the performance of a classifier from a ROC curve is “The best classification has the largest area under curve”. We can clearly see from the Fig. 9 that iCOSVM indeed leads to better ROC curves.

Fig. 9.
figure 9

ROC curves for the three incremental classifiers applied on Chromosome dataset

5 Conclusion

In this paper, we have proposed an incremental Covariance-guided One-Class Support Vector Machine (iCOSVM) classification approach. iCOSVM improves upon the incremental One-Class Support Vector Machine method by the incorporation of the covariance matrix into the optimization problem. The new introduced term emphasized the projection in the directions of low variances of the training datasets. The contribution of both Kernel and covariance matrices are controlled via a parameter that was tuned efficiently for optimum performance. iCOSVM takes advantages from the high accuracy of the canonical Covariance-guided One-Class Support Vector Machine (COSVM). We have presented detailed experiments on several artificial and real-world datasets, where we compared our method against contemporary batch and incremental learning methods. Results have shown the superiority of the method. Future works will consist in validating these results on strong applications such as face recognition, anomaly detection, etc.