Skip to main content

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

  • Conference paper
  • First Online:
Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017 (CORES 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 578))

Included in the following conference series:

Abstract

Performance evaluation of supervised classification learning method related to its prediction ability on independent data is very important in machine learning. It is also almost unthinkable to carry out any research work without the comparison of the new, proposed classifier with other already existing ones. This paper aims to review the most important aspects of the classifier evaluation process including the choice of evaluating metrics (scores) as well as the statistical comparison of classifiers. Critical view, recommendations and limitations of the reviewed methods are presented. The article provides a quick guide to understand the complexity of the classifier evaluation process and tries to warn the reader about the wrong habits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Batuvita, R., Palade, V.: A new performance measure for class imbalance learning: application to bioinformatics problem. In: Proceedings of 26th International Conference Machine Learning and Applications, pp. 545–550 (2009)

    Google Scholar 

  2. Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  3. Bouckaert, R.: Estimating replicability of classifier learning experiments. In: Proceedings of the 21st Conference on ICML. AAAI Press (2004)

    Google Scholar 

  4. Bradley, P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)

    Article  Google Scholar 

  5. Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1924 (1998)

    Article  Google Scholar 

  6. Demsar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Dmochowski, J., et al.: Maximum likelihood in cost-sensitive learning: model specification, approximation and upper bounds. J. Mach. Learn. Res. 11, 3313–3332 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Duda, R., Hart, P., Stork, D.: Pattern Classification and Scene Analysis. Wiley, New York (2000)

    Google Scholar 

  9. Drummond, C., Holte, R.: Cost curves: an improved method for visualizing classifier performance. Mach. Learn. 65(1), 95–130 (2006)

    Article  Google Scholar 

  10. Elkan, C.: The foundation of cost-sensitive learning. In: Proceedings of 4th International Conference Artificial Intelligence, vol. 17, pp. 973–978 (2001)

    Google Scholar 

  11. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  12. Ferri, C., et al.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009)

    Article  Google Scholar 

  13. Finner, H.: On a monotonicity problem in step-down multiple test procedures. J. Am. Stat. Assoc. 88, 920–923 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  14. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gama J., et. al.: On evaluating stream learning algorithms. Mach. Learn., pp. 1–30 (2013)

    Google Scholar 

  16. Garcia, S., Herrera, F.: An extension on statistical comparison of classifiers over multiple datasets for all pair-wise comparisons. J. Mach. Learn. Res. 9(12), 2677–2694 (2008)

    MATH  Google Scholar 

  17. Garcia, S., Fernandez, A., Lutengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in the computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)

    Article  Google Scholar 

  18. García, V., Mollineda, R.A., Sánchez, J.S.: Index of balanced accuracy: a performance measure for skewed class distributions. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds.) IbPRIA 2009. LNCS, vol. 5524, pp. 441–448. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02172-5_57

    Chapter  Google Scholar 

  19. Górecki, T., Krzyśko, M.: Regression methods for combining multiple classifiers. Commun. Stat. Simul. Comput. 44, 739–755 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. Hand, D., Till, R.: A simple generalization of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)

    Article  MATH  Google Scholar 

  21. Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)

    Article  Google Scholar 

  22. Hand, D., Anagnostopoulos, C.: A better beta for the H measure of classification performance. Pattern Recogn. Lett. 40, 41–46 (2014)

    Article  Google Scholar 

  23. He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans Data Knowl. Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  24. Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hodges, J.L., Lehmann, E.L.: Ranks methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 33, 482–487 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  26. Hollander, M., Wolfe, D.: Nonparametric Statistical Methods. Wiley, New York (2013)

    MATH  Google Scholar 

  27. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)

    MathSciNet  MATH  Google Scholar 

  28. Iman, R., Davenport, J.: Approximations of the critical region of the Friedman statistic. Comput. Stat. 9(6), 571–595 (1980)

    MATH  Google Scholar 

  29. Japkowicz, N., Stephen, N.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 40–49 (2002)

    MATH  Google Scholar 

  30. Japkowicz, N., Shah, M.: Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge (2011)

    Book  MATH  Google Scholar 

  31. Krzyśko, M., Wołyński, W., Górecki, T., Skorzybut, M.: Learning Systems. In: WNT, Warszawa (2008) (in Polish)

    Google Scholar 

  32. Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th ICML, pp. 179–186 (1997)

    Google Scholar 

  33. Kurzyński, M.: Pattern Recognition. Statistical Approach. Wrocław University Technology Press, Wrocław (1997) (in Polish)

    Google Scholar 

  34. Malina, W., Åšmiatacz, M.: Pattern Recognition. EXIT Press, Warszawa (2010) (in Polish)

    Google Scholar 

  35. Nadeau, C., Bengio, Y.: Inference for the generalization error. Mach. Learn. 52(3), 239–281 (2003)

    Article  MATH  Google Scholar 

  36. Prati, R., et al.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 23(11), 1601–1618 (2011)

    Article  Google Scholar 

  37. Ranavana, R., Palade, V.: Optimized precision: a new measure for classifier performance evaluation. In: Proceedings of the 23rd IEEE International Conference on Evolutionary Computation, pp. 2254–2261 (2006)

    Google Scholar 

  38. Quade, D.: Using weighted rankings in the analysis of complete blocks with additive block effects. J. Am. Stat. Assoc. 74, 680–683 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  39. Salzberg, S.: On comparing classifiers: pitfalls to avoid and recommended approach. Data Min. Knowl. Disc. 1, 317–328 (1997)

    Article  Google Scholar 

  40. Sánchez-Crisostomo, J.P., Alejo, R., López-González, E., Valdovinos, R.M., Pacheco-Sánchez, J.H.: Empirical analysis of assessments metrics for multi-class imbalance learning on the back-propagation context. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds.) ICSI 2014. LNCS, vol. 8795, pp. 17–23. Springer, Cham (2014). doi:10.1007/978-3-319-11897-0_3

    Google Scholar 

  41. Santafe, G., et al.: Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44, 467–508 (2015)

    Article  Google Scholar 

  42. Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–584 (1995)

    Article  Google Scholar 

  43. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Proc. Manag. 45, 427–437 (2009)

    Article  Google Scholar 

  44. StÄ…por, K.: Classification methods in computer vision. In: PWN, Warszawa (2011) (in Polish)

    Google Scholar 

  45. Sun, Y., et al.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  46. Sun, Y., et. al.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of International Conference on Data Mining, pp. 592–602 (2006)

    Google Scholar 

  47. Tadeusiewicz, R., Flasiński, M.: Pattern recognition. In: PWN, Warszawa (1991) (in Polish)

    Google Scholar 

  48. Wolpert, D.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)

    Article  Google Scholar 

  49. Woźniak, M.: Hybrid classifiers. Methods of Data, Knowledge and Classifier Combination. SCI, vol. 519, Springer, Heidelberg (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katarzyna StÄ…por .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

StÄ…por, K. (2018). Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59162-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59161-2

  • Online ISBN: 978-3-319-59162-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics