skip to main content
10.1145/1141277.1141411acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A methodology for comparing classifiers that allow the control of bias

Published: 23 April 2006 Publication History

Abstract

This paper presents False Positive-Critical Classifiers Comparison a new technique for pairwise comparison of classifiers that allow the control of bias. An evaluation of Naïve Bayes, k-Nearest Neighbour and Support Vector Machine classifiers has been carried out on five datasets containing unsolicited and legitimate e-mail messages to confirm the advantage of the technique over Receiver Operating Characteristic curves. The evaluation results suggest that the technique may be useful for choosing the better classifier when the ROC curves do not show comprehensive differences, as well as to prove that the difference between two classifiers is not significant, when ROC suggests that it might be. Spam filtering is a typical application for such a comparison tool, as it requires a classifier to be biased toward negative prediction and to have some upper limit on the rate of false positives. Finally the particular evaluation summary is presented, which confirms that Support Vector Machines out-perform other methods in most cases, while the Naïve Bayes classifier works well in a narrow, but relevant range of false positive rate.

References

[1]
I. Androutsopoulos, J. Koutsias, G. Paliouras, V. Karkaletsis, G. Sakkis, and C. Spyropoulos. Learning to filter spam email: A comparison of a naive bayesian and a memory based approach. In H. Zaragoza, P. Gallinari, and M. Rajman, editors, Procs of Workshop on Machine Learning and Textual Information Access, PKDD 2000, pages 1--13, 2000.
[2]
A. P. Bradley. The use of the area under the curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(6):1145--1157, 1997.
[3]
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based learning Methods. Cambridge University Press, 2000.
[4]
S. J. Delany and P. Cunningham. An analysis of case-based editing in a spam filtering system. In P. Funk and P. González-Calero, editors, 7th European Conference on Case-Based Reasoning (ECCBR 2004), volume 3155 of LNAI, pages 128--141. Springer, 2004.
[5]
T. G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895--1924, 1998.
[6]
H. Drucker, V. Vapnik, and D. Wu. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.
[7]
J. M. G. Hidalgo, M. M. Lopez, and E. P. Sanz. Combining text and heuristics for cost-sensitive spam filtering. In CoNLL-2000 and LLL-2000, Lisbon, Portugal, pages 99--102, 2000.
[8]
R. Kohavi, B. Becker, and D. Sommerfield. Improving simple bayes. In Procs of the 9th European Conf. on Machine Learning (ECML 97). Springer Verlag, 1997.
[9]
T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning, Procs of 2nd European Working Session on Learning (EWSL 87), pages 67--78. Sigma Press, 1987.
[10]
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk E-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.
[11]
J. Shawe-Taylor and N. Cristianini. Margin distribution and soft margin, 2000.
[12]
J. A. Swets. Measuring the accuracy of diagnostic systems. Science, (240):1285--1293, 1988.

Cited By

View all
  • (2009)Computer-assisted pit-pattern classification in different wavelet domains for supporting dignity assessment of colonic polypsPattern Recognition10.1016/j.patcog.2008.07.01242:6(1180-1191)Online publication date: 1-Jun-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Computer-assisted pit-pattern classification in different wavelet domains for supporting dignity assessment of colonic polypsPattern Recognition10.1016/j.patcog.2008.07.01242:6(1180-1191)Online publication date: 1-Jun-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media