Low false positive learning with support vector machines

https://doi.org/10.1016/j.jvcir.2016.03.007Get rights and content

Highlights

  • Novel 2-level classification method for low false positive classification.

  • Level 1 defines a decision boundary through an SVM classifier.

  • Level 2 defines a sensitive area around the decision boundary.

  • The sensitive area are analyzed by a second classifier to control false positives.

  • Method’s effectiveness showed trough comparisons to other solutions in 33 datasets.

Abstract

Most machine learning systems for binary classification are trained using algorithms that maximize the accuracy and assume that false positives and false negatives are equally bad. However, in many applications, these two types of errors may have very different costs. In this paper, we consider the problem of controlling the false positive rate on SVMs, since its traditional formulation does not offer such assurance. To solve this problem, we define a feature space sensitive area, where the probability of having false positives is higher, and use a second classifier (unanimity k-NN) in this area to better filter errors and improve the decision-making process. We call this method Risk Area SVM (RA-SVM). We compare the RA-SVM to other state-of-the-art methods for low false positive classification using 33 standard datasets in the literature. The solution we propose shows better performance in the vast majority of the cases using the standard Neyman–Pearson measure.

Introduction

There are several applications that are sensitive to false positives, such as spam filtering, face recognition and computer-aided diagnosis. In these applications, the errors from one class are much more costly than errors from the other class, and keeping the false positive rate under a maximal tolerance is usually more important than achieving a high classification accuracy. For example, in computer-based diagnosis, especially if the automated system is being used for triage of patients, falsely determining that a case is normal is much more serious than falsely determining that the case is abnormal. If a case is flagged as abnormal, it will usually proceed to a more costly diagnostics, but a case flagged as normal will not be further investigated. Thus a case falsely determined as normal will remain with the wrong diagnostics. In the case of a patient, the wrong diagnostics will cause the patient or the physician to believe he does not have the condition, and leave it untreated, which may have serious consequences. If the computer diagnostic flags the patient as having the disease, a more complex (and potentially more costly) procedure will be performed, which will likely determine that the patient does not have the condition. We will call the situation of wrongly flagging a case as normal (higher cost) as a false positive1, also called in the literature as a false alarm. We will also say that the positive class is more sensitive. Formally, for a given classifier f and a new data xiRd where d is the feature space dimension, the data class is denoted by yi while f(xi) denotes the predicted class of xi by f. A false positive is a data point xi such that f(xi)=+1, but yi=-1. A false negative is a point xj such that f(xj)=-1, but yj=+1. The false positive rate of the classifier f is then:FP(f)=|{xi|f(xi)=+1andyi=-1}||{xi|yi=-1}|.Similarly, the false negative rate FN(f) is the ratio of the number of false negatives divided by the number of positive cases.

Support Vector Machine (SVM) is a powerful algorithm for binary classification, which is known by its ability to handle high dimensional data efficiently. It has been widely used in many applications providing state-of-the-art accuracy to many classification problems. However, in the context of low false positive learning, a drawback of the traditional support vector classifier formulation is that it penalizes errors in both classes equally, and offers no assurance regarding the false positive rate. Thus, in problems such as spam filtering, for which a false positive rate constraint must be complied, the traditional SVM can be useless.

Nevertheless, observing the aforementioned limitations, some extensions to SVM have been proposed aiming at controling errors in an asymmetric way. The most common techniques for that are the Bias-Shifting (BS) and the Cost-Sensitive SVM (CS-SVM). While the former tries to control the false alarms by shifting the SVM’s decision boundary toward the sensitive class (positive in our case), the latter tries to adjust the SVM’s formulation in order to make misclassifications from the sensitive class more costly than the other class. The CS-SVM offers state-of-the-art results on the problem of low false positive classification, and several studies have been made in this direction. The BS, on the other hand, gives results that are close to the CS-SVM and is as efficient as the traditional SVM.

In this work, we propose the Risk Area SVM (RA-SVM), a novel method to efficiently solve the low false positive classification problem. It is an extension of the traditional support vector machine classifier and is able to control the false positive rate given a user-specified maximum allowed threshold. The RA-SVM selects a sensitive region close to the SVM’s decision boundary with a high incidence of false positives. Within that region, which we call risk area, the decision to classify a sample as positive is based on inspecting its k nearest neighbors (k-NN), and a new data inside this region will be classified as positive only if all its k-nearest neighbors are also positive.

The idea of combining k-NN within a region around the SVM’s decision boundary in order to control false positives was first introduced in [1] to solve a problem of automatic triage. Our work extends upon and further explore those ideas. Some of the main advances in our work herein are:

  • We develop a more effective technique for selecting the SVM’s sensitive region;

  • The k-NN classifier now works on the SVM’s high-dimensional feature space (using the same kernel), instead of the original feature space of the data allowing a much better data discrimination;

  • We proposed a novel technique that runs up to five times faster than the standard method and offers similar quality performance;

  • We evaluated the proposed methods (and the major techniques in the literature) on several standard benchmarks, from different sources and sizes, and on different scenarios (e.g., unbalanced data).

The requirement of keeping the false positive rate bounded below a certain level, while minimizing the false negative rate is also called the Neyman–Pearson classification paradigm [2], [3]. The requirement can also be stated as maximizing the accuracy (correct predictions), while keeping the false positive rate bounded. Thus, given a user specified threshold α, our objective is to:minimizefFN(f),subject toFP(f)α.

First, we briefly discusses SVM and its features. After that, we discusses alternatives to the problem of SVM based Neyman–Pearson classification. Next we introduce our methods, with four alternatives to define the risk area and solve the false positive learning problem. We then show the evaluation methodology used to validate the proposed methods and compare them with alternative algorithms in the literature. Finally, we concludes the paper and points out some possible future research opportunities.

Section snippets

Support vector machines

In a typical classification setting, we are given a sample of training vectors x1,,xnRd, each belonging to one of two classes, indicated by the respective labels y1,,yn{-1,+1}. The task is then to find a function f:Rd{-1,+1} that accurately predicts the label when presented with a new sample [4].

Support Vector Machines (SVM) are among the most effective methods for binary classification [4]. The idea is to find the maximum-margin hyperplane (w,b) in a high-dimensional space H that

Related work

There are many practical applications that require the classifier to produce a very low false positive rate. Therefore, several studies have been conducted to develop classifiers in this sense, which include techniques based on Naïve Bayes [6], [7], boosting [8], [9], [10], data compression [11], neural networks [12], ensemble learning [13], partial least squares [14], and cascade of classifiers [15], [16].

Support vectors machines is one of the most powerful algorithms for binary

The Risk Area SVM classifier

The Risk Area SVM (RA-SVM) is an extension of the traditional support vector machine classifier that incorporates the ability to control the false positive rate to a user-specified maximum. It is grounded on two facts that generally occur on SVMs:

  • 1.

    Most misclassifications are close to the decision boundary. In a support vector classifier, the further away a point is from the hyperplane, the more confident one is on its classification [31]. This is a well-known fact from SVM, and it is the primary

Evaluation methodology

In this paper, we followed two experimental setups. For the comparison of RA-SVM and RA-SVM-SV forms with Bias Shift (BS), One-Class SVM (OC-SVM), and Asymmetric SVM (ASVM), we implemented all the classifiers and we chose a set of datasets with different characteristics to measure how well the false positive rate is controlled. Section 5.1 discusses the performance metric used to measure how well each classifier satisfies the Neyman–Pearson criterium (Eq. (1)) while Section 5.2 presents the

Experiments and results

In our previous experiments (not reported in this paper) we discovered that the OSSRA form of Risk Area SVM performed better on average than the other three versions, both regarding NP-score and FP rates. For simplicity, in this paper, we only list the results for the OSSRA and OSSRA-SV. However, the practitioner must be aware that for a particular dataset one of the other three versions may achieve better results. Additional results with the other forms of Risk Area SVM are presented in the

Discussion

Besides the better performance of the OSSRA, other results may be of interest for practitioners. Bias Shift seems to be a reasonable alternative method for low false positive learning, while One-class SVM are almost never the best solution and many times it functions as a no-classifier, that is, the classifier assigns all data to the negative class. As for ASVM, Wu et al. [29] report results better than BS, which we could not reproduce in these experiments. However ASVM was developed and tested

Conclusions

Controlling false positives (or false alarms) is paramount in several machine learning problems varying from spam filtering to computer-aided diagnosis solutions. In this work, we have proposed a new method for controlling false positives for support vector machines, which we call Risk Area SVM (RA-SVM).

Our approach was based on the hypothesis that the majority of the misclassified points are usually around the decision boundary [32]. RA-SVM is based on a risk area around the SVM’s decision

Acknowledgments

We thank the Associate Editor and reviewers for their valuable comments. This work was supported by FAPESP (Grant #2010/05647-4), CNPq (Grants #304352/2012-8, and #477662/2013-7), CAPES/DeepEyes, and Microsoft Research.

References (38)

  • H.-N. Qu et al.

    An asymmetric classifier based on partial least squares

    Pattern Recogn.

    (2010)
  • Y. Sun et al.

    Cost-sensitive boosting for classification of imbalanced data

    Pattern Recogn.

    (2007)
  • Z. Qi et al.

    Cost-sensitive support vector machine for semi-supervised learning

    Proc. Comput. Sci.

    (2013)
  • A.B. Andre et al.

    A combination of support vector machine and k-nearest neighbors for machine fault detection

    Appl. Artif. Intell.

    (2013)
  • C. Scott et al.

    A Neyman–Pearson approach to statistical learning

    IEEE Trans. Inform. Theory

    (2005)
  • C. Scott

    Performance measures for Neyman–Pearson classification

    IEEE Trans. Inform. Theory

    (2007)
  • B. Schölkopf et al.

    Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond

    (2002)
  • J.A. Suykens et al.

    Least squares support vector machine classifiers

    Neural Process. Lett.

    (1999)
  • K.-M. Schneider, A comparison of event models for Naive Bayes anti-spam e-mail filtering, in: Proceedings of the Tenth...
  • I. Androutsopoulos, J. Koutsias, K.V. Chandrinos, C.D. Spyropoulos, An experimental comparison of Naive Bayesian and...
  • X. Carreras, L. Màrquez, Boosting Trees for Anti-Spam Email Filtering, CoRR cs.CL/0109015,...
  • P. Viola et al.

    Fast and robust classification using asymmetric adaboost and a detector cascade

    Adv. Neural Inform. Process. Syst.

    (2001)
  • H. Masnadi-Shirazi, N. Vasconcelos, Asymmetric boosting, in: Proceedings of the 24th International Conference on...
  • A. Bratko et al.

    Spam filtering using statistical data compression models

    Mach. Learn. Res.

    (2006)
  • Z.-H. Zhou et al.

    Training cost-sensitive neural networks with methods addressing the class imbalance problem

    IEEE Trans. Knowl. Data Eng.

    (2006)
  • T.R. Lynam, G.V. Cormack, D.R. Cheriton, On-line spam filter fusion, in: Proceedings of the 29th Annual International...
  • J. Wu, M.D. Mullin, J.M. Rehg, Linear asymmetric classifier for cascade detectors, in: Proceedings of the 22nd...
  • W.-t. Yih, J. Goodman, G. Hulten, Learning at low false positive rates, in: Proceedings of the Third Conference on...
  • G. Karakoulas et al.

    Optimizing classifiers for imbalanced training sets

    Adv. Neural Inform. Process. Syst. 11: Proc. 1998 Conf.

    (1999)
  • Cited by (9)

    • Machine learning-based biomarkers identification from toxicogenomics – Bridging to regulatory relevant phenotypic endpoints

      2022, Journal of Hazardous Materials
      Citation Excerpt :

      Though this should be taken into account while applying this method for a multiclass dataset, SVM works quite well for binary class dataset as used in this study. Choosing an advanced kernel, adjusting bias, providing weight to class and prediction errors may help to improve the accuracy, sensitivity, and specificity of the SVM classifier (Moraes et al., 2016; Davenport et al., 2006). In this study, we observe a clear change in the slopes in the score values of biomarkers (Fig. 1 and Fig. 3), and the top-ranked proteins biomarkers, with higher MRMR scores, are therefore recommended based on this transition point.

    • HEVC coding unit decision based on machine learning

      2022, Signal, Image and Video Processing
    • Open-Set Support Vector Machines

      2022, IEEE Transactions on Systems, Man, and Cybernetics: Systems
    • One side class SVM training methods for malware detection

      2022, Proceedings - 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2022
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Dacheng Tao.

    View full text