Beyond AP: a new evaluation index for multiclass classification task accuracy

Zhang, Kaifang; Su, Huayou; Dou, Yong

doi:10.1007/s10489-021-02223-7

Beyond AP: a new evaluation index for multiclass classification task accuracy

Published: 26 February 2021

Volume 51, pages 7166–7176, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Kaifang Zhang¹,
Huayou Su¹ &
Yong Dou¹

441 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Average precision (AP) and many other related evaluation indices have been employed ubiquitously in classification tasks for a long time. However, they have defects and can hardly provide both overall evaluations and individual evaluations. In practice, we have to strike a balance between whole and individual performances to satisfy diverse demands. To this end, we propose a new index for multiclass classification tasks, named \({R^{\prime }}\), which is an unbiased estimator of AP. Specifically, we improve the R index by taking the numerical differences between the real labels and predicted labels of each class into consideration. We evaluate its effectiveness and robustness on the MNIST and CIFAR-10 datasets. Experimental results show that it is positively correlated with some related indices. More importantly, we can obtain both overall and individual evaluations, which can be beneficial for improving training processes and model selection. Furthermore, as an evaluation architecture, the index can be promoted to evaluate any classification task, thereby implying broad application prospects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Example-Based Measures for Multi-label Classification Performance

Unified Performance Measure for Binary Classification Problems

Pacc - A Discriminative and Accuracy Correlated Measure for Assessment of Classification Results

References

Dou Y, Qiao P, Jin R (2019) Exploring the defects of the average precision and its influence. Scientia Sinica Informationis 49(10):1369–1382
Article Google Scholar
Sharma R, Goyal AK, Dwivedi RK (2016) A review of soft classification approaches on satellite image and accuracy assessment. In: Pant M, Deep K, Bansal J, Nagar A, Das K (eds) Proceedings of fifth international conference on soft computing for problem solving. Advances in intelligent systems and computing, vol 437. Springer, Singapore. https://doi.org/10.1007/978-981-10-0451-3_56
Erener A (2013) Classification method, spectral diversity, band combination and accuracy assessment evaluation for urban feature detection. Int J Classification Applied Earth Observation and Geoinformation 21:397–408
Article Google Scholar
Persello C, Bruzzone L (2010) A novel protocol for accuracy assessment in classification of very high resolution images. Geoscience and Remote Sensing 48(3-1):1232–1244
Article Google Scholar
Ding PLK, Li Y, Li B (2018) Mean local group average precision (mLGAP): a new performance metric for hashing-based retrieval. arXiv:1811.09763
He K, Lu Y, Sclaroff S (2018) Local descriptors optimized for average precision. 596–605. IEEE Computer Society
Yilmaz E, Aslam JA (2008) Estimating average precision when judgments are incomplete. Knowl Inf Syst 16(2):173–211
Article Google Scholar
Henderson P, Ferrari V (2017) End-to-end training of object class detectors for mean average precision. In: Lai SH, Lepetit V, Nishino K, Sato Y (eds) Computer vision - ACCV 2016. ACCV 2016. Lecture notes in computer science, vol 10115. Springer, Cham. https://doi.org/10.1007/978-3-319-54193-8_13
Mao H, Yang X, Dally WJ (2019) A delay metric for video object detection: what average precision fails to tell. arXiv:1908.06368
Piwowarski B, Dupret G, Lalmas M (2012) Beyond cumulated gain and average precision: including willingness and expectation in the user model. arXiv:1209.4479
Andric K, Kalpic D, Bohacek Z (2019) An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment. Comput Sci Inf Syst 16(1):155–178
Article Google Scholar
Bestgen Y (2015) Exact expected average precision of the random baseline for system evaluation. Prague Bull Math Linguistics 103:131–138
Article Google Scholar
Revaud J, Almazan J, de Rezende RS, de Souza CR (2019) Learning with average precision: training image retrieval with a listwise loss [C]. The IEEE International Conference on Computer Vision (ICCV)
Yuan Y, Wanhua S, Mu Z (2015) Threshold-free measures for assessing the performance of medical screening tests[J]. Frontiers in Public Health. 3
Wang X (2001) Problem and improvement of R-values applied to assessment of earthquake forecast. China Earthquake Research 3(16):75–83
Google Scholar
Dou AX, Wang XQ, Dou MW (2004) A new approach to evaluate the accuracy of image classification result R’ [C]. IEEE International Geoscience & Remote Sensing Symposium. IEEE
Omri A, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43(6):1223–1232
Article Google Scholar
Bruzzone L, Persello C (2008) A novel protocol for accuracy assessment in classification of very high resolution multispectral and SAR images. 265–268. IEEE
Candela JQ, Dagan I (2006) Machine learning challenges, evaluating predictive uncertainty, visual object classification and recognizing textual entailment. First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005: Revised Selected Papers[J]. Springer
Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. The MIT Press
Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics) [M]. Springer-Verlag New York, Inc.
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms[M]. Cambridge University Press, Cambridge
Book Google Scholar
Chen S, Fern A, Todorovic S (2015) Person count localization in videos from noisy foreground and detections. 1364–1372. IEEE Computer Society
Gajda J, Sroka R (2015) Design and accuracy assessment of the multi-sensor weigh-in-motion system [C]. Instrumentation & Measurement Technology Conference. 1036–1041. IEEE
Li W, Guo Q (2014) A new accuracy assessment method for one-class remote sensing classification. Geoscience and Remote Sensing 52(8):4621–4632
Article Google Scholar
Simard F, Ayala D, Kamdem GC, Pombi M, Etouna J, Ose K, Fotsing JM, Fontenille D, Besansky NJ, Costantini C (2009) Ecological niche partitioning between Anopheles gambiae molecular forms in cameroon: the ecological side of speciation. Bmc Ecology 9(1):17
Article Google Scholar
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognition Letters 30(1):27–38
Article Google Scholar
Mosley L (2013) A balanced approach to the multi-class imbalance problem[D]
Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes[J]. IEEE Transactions on Knowledge & Data Engineering 21(10):1361–1371
Article Google Scholar
Jiang L, Cai Z, Wang D (2012) Improving tree augmented Naive Bayes for class probability estimation[J]. Knowl-Based Syst 26:239–245
Article Google Scholar
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning[J]. Knowl-Based Syst 174(JUN.15):137–143
Article Google Scholar
Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion 54:128–144
Article Google Scholar
Zhou L, Wang Q, Fujita H (2016) One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies[J]. Information Fusion 36:80–89
Article Google Scholar
Zhou L, Fujita H (2017) Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification[M]. Elsevier Science Inc.
Zhou L, Tam KP, Fujita H (2016) Predicting the listing status of chinese listed companies with multi-class classification Models[J]. Inf Sci 328:222–236
Article Google Scholar
Sun J, Lang J, Fujita H, Li H (2017) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates[J]. Information Sciences. S0020025517310083
Deng T, Ye D, Ma R, Fujita H, Xiong L (2020) Low-rank local tangent space embedding for subspace clustering[J]. Inf Sci 508:1–21
Article MathSciNet Google Scholar

Download references

Acknowledgments

The work is sponsored by National Key Research and Development Program of China (2018YFB0204301), and Open Fund of PDL (6142110190201).

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Kaifang Zhang, Huayou Su & Yong Dou

Authors

Kaifang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huayou Su
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huayou Su.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, K., Su, H. & Dou, Y. Beyond AP: a new evaluation index for multiclass classification task accuracy. Appl Intell 51, 7166–7176 (2021). https://doi.org/10.1007/s10489-021-02223-7

Download citation

Accepted: 20 January 2021
Published: 26 February 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s10489-021-02223-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond AP: a new evaluation index for multiclass classification task accuracy

Abstract

Access this article

Similar content being viewed by others

Evaluation of Example-Based Measures for Multi-label Classification Performance

Unified Performance Measure for Binary Classification Problems

Pacc - A Discriminative and Accuracy Correlated Measure for Assessment of Classification Results

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Beyond AP: a new evaluation index for multiclass classification task accuracy

Abstract

Access this article

Similar content being viewed by others

Evaluation of Example-Based Measures for Multi-label Classification Performance

Unified Performance Measure for Binary Classification Problems

Pacc - A Discriminative and Accuracy Correlated Measure for Assessment of Classification Results

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation