Low false positive learning with support vector machines☆
Introduction
There are several applications that are sensitive to false positives, such as spam filtering, face recognition and computer-aided diagnosis. In these applications, the errors from one class are much more costly than errors from the other class, and keeping the false positive rate under a maximal tolerance is usually more important than achieving a high classification accuracy. For example, in computer-based diagnosis, especially if the automated system is being used for triage of patients, falsely determining that a case is normal is much more serious than falsely determining that the case is abnormal. If a case is flagged as abnormal, it will usually proceed to a more costly diagnostics, but a case flagged as normal will not be further investigated. Thus a case falsely determined as normal will remain with the wrong diagnostics. In the case of a patient, the wrong diagnostics will cause the patient or the physician to believe he does not have the condition, and leave it untreated, which may have serious consequences. If the computer diagnostic flags the patient as having the disease, a more complex (and potentially more costly) procedure will be performed, which will likely determine that the patient does not have the condition. We will call the situation of wrongly flagging a case as normal (higher cost) as a false positive1, also called in the literature as a false alarm. We will also say that the positive class is more sensitive. Formally, for a given classifier f and a new data where d is the feature space dimension, the data class is denoted by while denotes the predicted class of by f. A false positive is a data point such that , but . A false negative is a point such that , but . The false positive rate of the classifier f is then:Similarly, the false negative rate is the ratio of the number of false negatives divided by the number of positive cases.
Support Vector Machine (SVM) is a powerful algorithm for binary classification, which is known by its ability to handle high dimensional data efficiently. It has been widely used in many applications providing state-of-the-art accuracy to many classification problems. However, in the context of low false positive learning, a drawback of the traditional support vector classifier formulation is that it penalizes errors in both classes equally, and offers no assurance regarding the false positive rate. Thus, in problems such as spam filtering, for which a false positive rate constraint must be complied, the traditional SVM can be useless.
Nevertheless, observing the aforementioned limitations, some extensions to SVM have been proposed aiming at controling errors in an asymmetric way. The most common techniques for that are the Bias-Shifting (BS) and the Cost-Sensitive SVM (CS-SVM). While the former tries to control the false alarms by shifting the SVM’s decision boundary toward the sensitive class (positive in our case), the latter tries to adjust the SVM’s formulation in order to make misclassifications from the sensitive class more costly than the other class. The CS-SVM offers state-of-the-art results on the problem of low false positive classification, and several studies have been made in this direction. The BS, on the other hand, gives results that are close to the CS-SVM and is as efficient as the traditional SVM.
In this work, we propose the Risk Area SVM (RA-SVM), a novel method to efficiently solve the low false positive classification problem. It is an extension of the traditional support vector machine classifier and is able to control the false positive rate given a user-specified maximum allowed threshold. The RA-SVM selects a sensitive region close to the SVM’s decision boundary with a high incidence of false positives. Within that region, which we call risk area, the decision to classify a sample as positive is based on inspecting its k nearest neighbors (k-NN), and a new data inside this region will be classified as positive only if all its k-nearest neighbors are also positive.
The idea of combining k-NN within a region around the SVM’s decision boundary in order to control false positives was first introduced in [1] to solve a problem of automatic triage. Our work extends upon and further explore those ideas. Some of the main advances in our work herein are:
- •
We develop a more effective technique for selecting the SVM’s sensitive region;
- •
The k-NN classifier now works on the SVM’s high-dimensional feature space (using the same kernel), instead of the original feature space of the data allowing a much better data discrimination;
- •
We proposed a novel technique that runs up to five times faster than the standard method and offers similar quality performance;
- •
We evaluated the proposed methods (and the major techniques in the literature) on several standard benchmarks, from different sources and sizes, and on different scenarios (e.g., unbalanced data).
The requirement of keeping the false positive rate bounded below a certain level, while minimizing the false negative rate is also called the Neyman–Pearson classification paradigm [2], [3]. The requirement can also be stated as maximizing the accuracy (correct predictions), while keeping the false positive rate bounded. Thus, given a user specified threshold , our objective is to:
First, we briefly discusses SVM and its features. After that, we discusses alternatives to the problem of SVM based Neyman–Pearson classification. Next we introduce our methods, with four alternatives to define the risk area and solve the false positive learning problem. We then show the evaluation methodology used to validate the proposed methods and compare them with alternative algorithms in the literature. Finally, we concludes the paper and points out some possible future research opportunities.
Section snippets
Support vector machines
In a typical classification setting, we are given a sample of training vectors , each belonging to one of two classes, indicated by the respective labels . The task is then to find a function that accurately predicts the label when presented with a new sample [4].
Support Vector Machines (SVM) are among the most effective methods for binary classification [4]. The idea is to find the maximum-margin hyperplane () in a high-dimensional space that
Related work
There are many practical applications that require the classifier to produce a very low false positive rate. Therefore, several studies have been conducted to develop classifiers in this sense, which include techniques based on Naïve Bayes [6], [7], boosting [8], [9], [10], data compression [11], neural networks [12], ensemble learning [13], partial least squares [14], and cascade of classifiers [15], [16].
Support vectors machines is one of the most powerful algorithms for binary
The Risk Area SVM classifier
The Risk Area SVM (RA-SVM) is an extension of the traditional support vector machine classifier that incorporates the ability to control the false positive rate to a user-specified maximum. It is grounded on two facts that generally occur on SVMs:
- 1.
Most misclassifications are close to the decision boundary. In a support vector classifier, the further away a point is from the hyperplane, the more confident one is on its classification [31]. This is a well-known fact from SVM, and it is the primary
Evaluation methodology
In this paper, we followed two experimental setups. For the comparison of RA-SVM and RA-SVM-SV forms with Bias Shift (BS), One-Class SVM (OC-SVM), and Asymmetric SVM (ASVM), we implemented all the classifiers and we chose a set of datasets with different characteristics to measure how well the false positive rate is controlled. Section 5.1 discusses the performance metric used to measure how well each classifier satisfies the Neyman–Pearson criterium (Eq. (1)) while Section 5.2 presents the
Experiments and results
In our previous experiments (not reported in this paper) we discovered that the OSSRA form of Risk Area SVM performed better on average than the other three versions, both regarding NP-score and FP rates. For simplicity, in this paper, we only list the results for the OSSRA and OSSRA-SV. However, the practitioner must be aware that for a particular dataset one of the other three versions may achieve better results. Additional results with the other forms of Risk Area SVM are presented in the
Discussion
Besides the better performance of the OSSRA, other results may be of interest for practitioners. Bias Shift seems to be a reasonable alternative method for low false positive learning, while One-class SVM are almost never the best solution and many times it functions as a no-classifier, that is, the classifier assigns all data to the negative class. As for ASVM, Wu et al. [29] report results better than BS, which we could not reproduce in these experiments. However ASVM was developed and tested
Conclusions
Controlling false positives (or false alarms) is paramount in several machine learning problems varying from spam filtering to computer-aided diagnosis solutions. In this work, we have proposed a new method for controlling false positives for support vector machines, which we call Risk Area SVM (RA-SVM).
Our approach was based on the hypothesis that the majority of the misclassified points are usually around the decision boundary [32]. RA-SVM is based on a risk area around the SVM’s decision
Acknowledgments
We thank the Associate Editor and reviewers for their valuable comments. This work was supported by FAPESP (Grant #2010/05647-4), CNPq (Grants #304352/2012-8, and #477662/2013-7), CAPES/DeepEyes, and Microsoft Research.
References (38)
- et al.
An asymmetric classifier based on partial least squares
Pattern Recogn.
(2010) - et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recogn.
(2007) - et al.
Cost-sensitive support vector machine for semi-supervised learning
Proc. Comput. Sci.
(2013) - et al.
A combination of support vector machine and k-nearest neighbors for machine fault detection
Appl. Artif. Intell.
(2013) - et al.
A Neyman–Pearson approach to statistical learning
IEEE Trans. Inform. Theory
(2005) Performance measures for Neyman–Pearson classification
IEEE Trans. Inform. Theory
(2007)- et al.
Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond
(2002) - et al.
Least squares support vector machine classifiers
Neural Process. Lett.
(1999) - K.-M. Schneider, A comparison of event models for Naive Bayes anti-spam e-mail filtering, in: Proceedings of the Tenth...
- I. Androutsopoulos, J. Koutsias, K.V. Chandrinos, C.D. Spyropoulos, An experimental comparison of Naive Bayesian and...
Fast and robust classification using asymmetric adaboost and a detector cascade
Adv. Neural Inform. Process. Syst.
Spam filtering using statistical data compression models
Mach. Learn. Res.
Training cost-sensitive neural networks with methods addressing the class imbalance problem
IEEE Trans. Knowl. Data Eng.
Optimizing classifiers for imbalanced training sets
Adv. Neural Inform. Process. Syst. 11: Proc. 1998 Conf.
Cited by (9)
Machine learning-based biomarkers identification from toxicogenomics – Bridging to regulatory relevant phenotypic endpoints
2022, Journal of Hazardous MaterialsCitation Excerpt :Though this should be taken into account while applying this method for a multiclass dataset, SVM works quite well for binary class dataset as used in this study. Choosing an advanced kernel, adjusting bias, providing weight to class and prediction errors may help to improve the accuracy, sensitivity, and specificity of the SVM classifier (Moraes et al., 2016; Davenport et al., 2006). In this study, we observe a clear change in the slopes in the score values of biomarkers (Fig. 1 and Fig. 3), and the top-ranked proteins biomarkers, with higher MRMR scores, are therefore recommended based on this transition point.
Kinesiology-inspired estimation of pedestrian walk direction for smart surveillance
2020, Future Generation Computer SystemsHEVC coding unit decision based on machine learning
2022, Signal, Image and Video ProcessingOpen-Set Support Vector Machines
2022, IEEE Transactions on Systems, Man, and Cybernetics: SystemsOne side class SVM training methods for malware detection
2022, Proceedings - 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2022Machine Learning-Based approaches to reduce HEVC intra coding unit partition decision complexity
2022, Multimedia Tools and Applications
- ☆
This paper has been recommended for acceptance by Dacheng Tao.