Article

A methodology for comparing classifiers that allow the control of bias

Authors:

Anton Zamolotskikh,

Sarah Jane Delany,

Pádraig CunninghamAuthors Info & Claims

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

Pages 582 - 587

https://doi.org/10.1145/1141277.1141411

Published: 23 April 2006 Publication History

Get Access

Abstract

This paper presents False Positive-Critical Classifiers Comparison a new technique for pairwise comparison of classifiers that allow the control of bias. An evaluation of Naïve Bayes, k-Nearest Neighbour and Support Vector Machine classifiers has been carried out on five datasets containing unsolicited and legitimate e-mail messages to confirm the advantage of the technique over Receiver Operating Characteristic curves. The evaluation results suggest that the technique may be useful for choosing the better classifier when the ROC curves do not show comprehensive differences, as well as to prove that the difference between two classifiers is not significant, when ROC suggests that it might be. Spam filtering is a typical application for such a comparison tool, as it requires a classifier to be biased toward negative prediction and to have some upper limit on the rate of false positives. Finally the particular evaluation summary is presented, which confirms that Support Vector Machines out-perform other methods in most cases, while the Naïve Bayes classifier works well in a narrow, but relevant range of false positive rate.

References

[1]

I. Androutsopoulos, J. Koutsias, G. Paliouras, V. Karkaletsis, G. Sakkis, and C. Spyropoulos. Learning to filter spam email: A comparison of a naive bayesian and a memory based approach. In H. Zaragoza, P. Gallinari, and M. Rajman, editors, Procs of Workshop on Machine Learning and Textual Information Access, PKDD 2000, pages 1--13, 2000.

Google Scholar

[2]

A. P. Bradley. The use of the area under the curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(6):1145--1157, 1997.

Digital Library

Google Scholar

[3]

N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based learning Methods. Cambridge University Press, 2000.

Digital Library

Google Scholar

[4]

S. J. Delany and P. Cunningham. An analysis of case-based editing in a spam filtering system. In P. Funk and P. González-Calero, editors, 7th European Conference on Case-Based Reasoning (ECCBR 2004), volume 3155 of LNAI, pages 128--141. Springer, 2004.

Google Scholar

[5]

T. G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895--1924, 1998.

Digital Library

Google Scholar

[6]

H. Drucker, V. Vapnik, and D. Wu. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.

Digital Library

Google Scholar

[7]

J. M. G. Hidalgo, M. M. Lopez, and E. P. Sanz. Combining text and heuristics for cost-sensitive spam filtering. In CoNLL-2000 and LLL-2000, Lisbon, Portugal, pages 99--102, 2000.

Digital Library

Google Scholar

[8]

R. Kohavi, B. Becker, and D. Sommerfield. Improving simple bayes. In Procs of the 9th European Conf. on Machine Learning (ECML 97). Springer Verlag, 1997.

Google Scholar

[9]

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning, Procs of 2nd European Working Session on Learning (EWSL 87), pages 67--78. Sigma Press, 1987.

Google Scholar

[10]

M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk E-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.

Google Scholar

[11]

J. Shawe-Taylor and N. Cristianini. Margin distribution and soft margin, 2000.

Google Scholar

[12]

J. A. Swets. Measuring the accuracy of diagnostic systems. Science, (240):1285--1293, 1988.

Google Scholar

Cited By

View all

Häfner MKwitt RUhl AWrba FGangl AVécsei A(2009)Computer-assisted pit-pattern classification in different wavelet domains for supporting dignity assessment of colonic polypsPattern Recognition10.1016/j.patcog.2008.07.01242:6(1180-1191)Online publication date: 1-Jun-2009
https://dl.acm.org/doi/10.1016/j.patcog.2008.07.012

Index Terms

A methodology for comparing classifiers that allow the control of bias
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

AdaBoost classifiers for pecan defect classification

Highlights The performance of AdaBoost algorithms were compared with support vector machine and Bayesian classifiers for pecan defect classification. AdaBoost classifiers took least time and gave best classification accuracy. AdaBoost classifiers ...
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition

In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree DT are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the ...
Comparing classifiers and metaclassifiers
ICDM'11: Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects

A metaclassifier is a technique that integrates multiple base classifiers. In this paper a hybrid meta-classifier algorithm based on generative and non-generative methods is proposed. Five well-know strong classifiers are used for the non-generative ...

Comments

Information & Contributors

Information

Published In

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

April 2006

1967 pages

ISBN:1595931082

DOI:10.1145/1141277

Conference Chair:
Hisham M. Haddad
Kennesaw State University, Kennesaw, Georgia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SAC06

Sponsor:

SIGAPP

SAC06: The 2006 ACM Symposium on Applied Computing

April 23 - 27, 2006

Dijon, France

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
239
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Häfner MKwitt RUhl AWrba FGangl AVécsei A(2009)Computer-assisted pit-pattern classification in different wavelet domains for supporting dignity assessment of colonic polypsPattern Recognition10.1016/j.patcog.2008.07.01242:6(1180-1191)Online publication date: 1-Jun-2009
https://dl.acm.org/doi/10.1016/j.patcog.2008.07.012

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

AdaBoost classifiers for pecan defect classification

Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition

Comparing classifiers and metaclassifiers

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations