Skip to main content
Log in

Beyond AP: a new evaluation index for multiclass classification task accuracy

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Average precision (AP) and many other related evaluation indices have been employed ubiquitously in classification tasks for a long time. However, they have defects and can hardly provide both overall evaluations and individual evaluations. In practice, we have to strike a balance between whole and individual performances to satisfy diverse demands. To this end, we propose a new index for multiclass classification tasks, named \({R^{\prime }}\), which is an unbiased estimator of AP. Specifically, we improve the R index by taking the numerical differences between the real labels and predicted labels of each class into consideration. We evaluate its effectiveness and robustness on the MNIST and CIFAR-10 datasets. Experimental results show that it is positively correlated with some related indices. More importantly, we can obtain both overall and individual evaluations, which can be beneficial for improving training processes and model selection. Furthermore, as an evaluation architecture, the index can be promoted to evaluate any classification task, thereby implying broad application prospects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Dou Y, Qiao P, Jin R (2019) Exploring the defects of the average precision and its influence. Scientia Sinica Informationis 49(10):1369–1382

    Article  Google Scholar 

  2. Sharma R, Goyal AK, Dwivedi RK (2016) A review of soft classification approaches on satellite image and accuracy assessment. In: Pant M, Deep K, Bansal J, Nagar A, Das K (eds) Proceedings of fifth international conference on soft computing for problem solving. Advances in intelligent systems and computing, vol 437. Springer, Singapore. https://doi.org/10.1007/978-981-10-0451-3_56

  3. Erener A (2013) Classification method, spectral diversity, band combination and accuracy assessment evaluation for urban feature detection. Int J Classification Applied Earth Observation and Geoinformation 21:397–408

    Article  Google Scholar 

  4. Persello C, Bruzzone L (2010) A novel protocol for accuracy assessment in classification of very high resolution images. Geoscience and Remote Sensing 48(3-1):1232–1244

    Article  Google Scholar 

  5. Ding PLK, Li Y, Li B (2018) Mean local group average precision (mLGAP): a new performance metric for hashing-based retrieval. arXiv:1811.09763

  6. He K, Lu Y, Sclaroff S (2018) Local descriptors optimized for average precision. 596–605. IEEE Computer Society

  7. Yilmaz E, Aslam JA (2008) Estimating average precision when judgments are incomplete. Knowl Inf Syst 16(2):173–211

    Article  Google Scholar 

  8. Henderson P, Ferrari V (2017) End-to-end training of object class detectors for mean average precision. In: Lai SH, Lepetit V, Nishino K, Sato Y (eds) Computer vision - ACCV 2016. ACCV 2016. Lecture notes in computer science, vol 10115. Springer, Cham. https://doi.org/10.1007/978-3-319-54193-8_13

  9. Mao H, Yang X, Dally WJ (2019) A delay metric for video object detection: what average precision fails to tell. arXiv:1908.06368

  10. Piwowarski B, Dupret G, Lalmas M (2012) Beyond cumulated gain and average precision: including willingness and expectation in the user model. arXiv:1209.4479

  11. Andric K, Kalpic D, Bohacek Z (2019) An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment. Comput Sci Inf Syst 16(1):155–178

    Article  Google Scholar 

  12. Bestgen Y (2015) Exact expected average precision of the random baseline for system evaluation. Prague Bull Math Linguistics 103:131–138

    Article  Google Scholar 

  13. Revaud J, Almazan J, de Rezende RS, de Souza CR (2019) Learning with average precision: training image retrieval with a listwise loss [C]. The IEEE International Conference on Computer Vision (ICCV)

  14. Yuan Y, Wanhua S, Mu Z (2015) Threshold-free measures for assessing the performance of medical screening tests[J]. Frontiers in Public Health. 3

  15. Wang X (2001) Problem and improvement of R-values applied to assessment of earthquake forecast. China Earthquake Research 3(16):75–83

    Google Scholar 

  16. Dou AX, Wang XQ, Dou MW (2004) A new approach to evaluate the accuracy of image classification result R’ [C]. IEEE International Geoscience & Remote Sensing Symposium. IEEE

  17. Omri A, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43(6):1223–1232

    Article  Google Scholar 

  18. Bruzzone L, Persello C (2008) A novel protocol for accuracy assessment in classification of very high resolution multispectral and SAR images. 265–268. IEEE

  19. Candela JQ, Dagan I (2006) Machine learning challenges, evaluating predictive uncertainty, visual object classification and recognizing textual entailment. First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005: Revised Selected Papers[J]. Springer

  20. Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. The MIT Press

  21. Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics) [M]. Springer-Verlag New York, Inc.

  22. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms[M]. Cambridge University Press, Cambridge

    Book  Google Scholar 

  23. Chen S, Fern A, Todorovic S (2015) Person count localization in videos from noisy foreground and detections. 1364–1372. IEEE Computer Society

  24. Gajda J, Sroka R (2015) Design and accuracy assessment of the multi-sensor weigh-in-motion system [C]. Instrumentation & Measurement Technology Conference. 1036–1041. IEEE

  25. Li W, Guo Q (2014) A new accuracy assessment method for one-class remote sensing classification. Geoscience and Remote Sensing 52(8):4621–4632

    Article  Google Scholar 

  26. Simard F, Ayala D, Kamdem GC, Pombi M, Etouna J, Ose K, Fotsing JM, Fontenille D, Besansky NJ, Costantini C (2009) Ecological niche partitioning between Anopheles gambiae molecular forms in cameroon: the ecological side of speciation. Bmc Ecology 9(1):17

    Article  Google Scholar 

  27. Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognition Letters 30(1):27–38

    Article  Google Scholar 

  28. Mosley L (2013) A balanced approach to the multi-class imbalance problem[D]

  29. Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes[J]. IEEE Transactions on Knowledge & Data Engineering 21(10):1361–1371

    Article  Google Scholar 

  30. Jiang L, Cai Z, Wang D (2012) Improving tree augmented Naive Bayes for class probability estimation[J]. Knowl-Based Syst 26:239–245

    Article  Google Scholar 

  31. Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning[J]. Knowl-Based Syst 174(JUN.15):137–143

    Article  Google Scholar 

  32. Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion 54:128–144

    Article  Google Scholar 

  33. Zhou L, Wang Q, Fujita H (2016) One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies[J]. Information Fusion 36:80–89

    Article  Google Scholar 

  34. Zhou L, Fujita H (2017) Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification[M]. Elsevier Science Inc.

  35. Zhou L, Tam KP, Fujita H (2016) Predicting the listing status of chinese listed companies with multi-class classification Models[J]. Inf Sci 328:222–236

    Article  Google Scholar 

  36. Sun J, Lang J, Fujita H, Li H (2017) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates[J]. Information Sciences. S0020025517310083

  37. Deng T, Ye D, Ma R, Fujita H, Xiong L (2020) Low-rank local tangent space embedding for subspace clustering[J]. Inf Sci 508:1–21

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The work is sponsored by National Key Research and Development Program of China (2018YFB0204301), and Open Fund of PDL (6142110190201).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huayou Su.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, K., Su, H. & Dou, Y. Beyond AP: a new evaluation index for multiclass classification task accuracy. Appl Intell 51, 7166–7176 (2021). https://doi.org/10.1007/s10489-021-02223-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02223-7

Keywords

Navigation