Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Stąpor, Katarzyna

doi:10.1007/978-3-319-59162-9_2

Katarzyna Stąpor¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 578))

Included in the following conference series:

International Conference on Computer Recognition Systems

1571 Accesses
15 Citations

Abstract

Performance evaluation of supervised classification learning method related to its prediction ability on independent data is very important in machine learning. It is also almost unthinkable to carry out any research work without the comparison of the new, proposed classifier with other already existing ones. This paper aims to review the most important aspects of the classifier evaluation process including the choice of evaluating metrics (scores) as well as the statistical comparison of classifiers. Critical view, recommendations and limitations of the reviewed methods are presented. The article provides a quick guide to understand the complexity of the classifier evaluation process and tries to warn the reader about the wrong habits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Batuvita, R., Palade, V.: A new performance measure for class imbalance learning: application to bioinformatics problem. In: Proceedings of 26th International Conference Machine Learning and Applications, pp. 545–550 (2009)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Bouckaert, R.: Estimating replicability of classifier learning experiments. In: Proceedings of the 21st Conference on ICML. AAAI Press (2004)
Google Scholar
Bradley, P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Article Google Scholar
Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1924 (1998)
Article Google Scholar
Demsar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Dmochowski, J., et al.: Maximum likelihood in cost-sensitive learning: model specification, approximation and upper bounds. J. Mach. Learn. Res. 11, 3313–3332 (2010)
MathSciNet MATH Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification and Scene Analysis. Wiley, New York (2000)
Google Scholar
Drummond, C., Holte, R.: Cost curves: an improved method for visualizing classifier performance. Mach. Learn. 65(1), 95–130 (2006)
Article Google Scholar
Elkan, C.: The foundation of cost-sensitive learning. In: Proceedings of 4th International Conference Artificial Intelligence, vol. 17, pp. 973–978 (2001)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Ferri, C., et al.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009)
Article Google Scholar
Finner, H.: On a monotonicity problem in step-down multiple test procedures. J. Am. Stat. Assoc. 88, 920–923 (1993)
Article MathSciNet MATH Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)
Article MathSciNet MATH Google Scholar
Gama J., et. al.: On evaluating stream learning algorithms. Mach. Learn., pp. 1–30 (2013)
Google Scholar
Garcia, S., Herrera, F.: An extension on statistical comparison of classifiers over multiple datasets for all pair-wise comparisons. J. Mach. Learn. Res. 9(12), 2677–2694 (2008)
MATH Google Scholar
Garcia, S., Fernandez, A., Lutengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in the computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)
Article Google Scholar
García, V., Mollineda, R.A., Sánchez, J.S.: Index of balanced accuracy: a performance measure for skewed class distributions. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds.) IbPRIA 2009. LNCS, vol. 5524, pp. 441–448. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02172-5_57
Chapter Google Scholar
Górecki, T., Krzyśko, M.: Regression methods for combining multiple classifiers. Commun. Stat. Simul. Comput. 44, 739–755 (2015)
Article MathSciNet MATH Google Scholar
Hand, D., Till, R.: A simple generalization of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)
Article MATH Google Scholar
Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)
Article Google Scholar
Hand, D., Anagnostopoulos, C.: A better beta for the H measure of classification performance. Pattern Recogn. Lett. 40, 41–46 (2014)
Article Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans Data Knowl. Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802 (1988)
Article MathSciNet MATH Google Scholar
Hodges, J.L., Lehmann, E.L.: Ranks methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 33, 482–487 (1962)
Article MathSciNet MATH Google Scholar
Hollander, M., Wolfe, D.: Nonparametric Statistical Methods. Wiley, New York (2013)
MATH Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)
MathSciNet MATH Google Scholar
Iman, R., Davenport, J.: Approximations of the critical region of the Friedman statistic. Comput. Stat. 9(6), 571–595 (1980)
MATH Google Scholar
Japkowicz, N., Stephen, N.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 40–49 (2002)
MATH Google Scholar
Japkowicz, N., Shah, M.: Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge (2011)
Book MATH Google Scholar
Krzyśko, M., Wołyński, W., Górecki, T., Skorzybut, M.: Learning Systems. In: WNT, Warszawa (2008) (in Polish)
Google Scholar
Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th ICML, pp. 179–186 (1997)
Google Scholar
Kurzyński, M.: Pattern Recognition. Statistical Approach. Wrocław University Technology Press, Wrocław (1997) (in Polish)
Google Scholar
Malina, W., Śmiatacz, M.: Pattern Recognition. EXIT Press, Warszawa (2010) (in Polish)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Mach. Learn. 52(3), 239–281 (2003)
Article MATH Google Scholar
Prati, R., et al.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 23(11), 1601–1618 (2011)
Article Google Scholar
Ranavana, R., Palade, V.: Optimized precision: a new measure for classifier performance evaluation. In: Proceedings of the 23rd IEEE International Conference on Evolutionary Computation, pp. 2254–2261 (2006)
Google Scholar
Quade, D.: Using weighted rankings in the analysis of complete blocks with additive block effects. J. Am. Stat. Assoc. 74, 680–683 (1979)
Article MathSciNet MATH Google Scholar
Salzberg, S.: On comparing classifiers: pitfalls to avoid and recommended approach. Data Min. Knowl. Disc. 1, 317–328 (1997)
Article Google Scholar
Sánchez-Crisostomo, J.P., Alejo, R., López-González, E., Valdovinos, R.M., Pacheco-Sánchez, J.H.: Empirical analysis of assessments metrics for multi-class imbalance learning on the back-propagation context. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds.) ICSI 2014. LNCS, vol. 8795, pp. 17–23. Springer, Cham (2014). doi:10.1007/978-3-319-11897-0_3
Google Scholar
Santafe, G., et al.: Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44, 467–508 (2015)
Article Google Scholar
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–584 (1995)
Article Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Proc. Manag. 45, 427–437 (2009)
Article Google Scholar
Stąpor, K.: Classification methods in computer vision. In: PWN, Warszawa (2011) (in Polish)
Google Scholar
Sun, Y., et al.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)
Article Google Scholar
Sun, Y., et. al.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of International Conference on Data Mining, pp. 592–602 (2006)
Google Scholar
Tadeusiewicz, R., Flasiński, M.: Pattern recognition. In: PWN, Warszawa (1991) (in Polish)
Google Scholar
Wolpert, D.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Article Google Scholar
Woźniak, M.: Hybrid classifiers. Methods of Data, Knowledge and Classifier Combination. SCI, vol. 519, Springer, Heidelberg (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Silesian Technical University, Gliwice, Poland
Katarzyna Stąpor

Authors

Katarzyna Stąpor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katarzyna Stąpor .

Editor information

Editors and Affiliations

Department of Systems and Computer Networks, Wrocław University of Technology, Wrocław, Poland
Marek Kurzynski
Department of Systems and Computer Networks, Wrocław University of Technology, Wroclaw, Poland
Michal Wozniak
Department of Systems and Computer Networks, Wrocław University of Technology , Wroclaw, Poland
Robert Burduk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stąpor, K. (2018). Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-59162-9_2
Published: 07 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59161-2
Online ISBN: 978-3-319-59162-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics