Abstract
Average precision (AP) and many other related evaluation indices have been employed ubiquitously in classification tasks for a long time. However, they have defects and can hardly provide both overall evaluations and individual evaluations. In practice, we have to strike a balance between whole and individual performances to satisfy diverse demands. To this end, we propose a new index for multiclass classification tasks, named \({R^{\prime }}\), which is an unbiased estimator of AP. Specifically, we improve the R index by taking the numerical differences between the real labels and predicted labels of each class into consideration. We evaluate its effectiveness and robustness on the MNIST and CIFAR-10 datasets. Experimental results show that it is positively correlated with some related indices. More importantly, we can obtain both overall and individual evaluations, which can be beneficial for improving training processes and model selection. Furthermore, as an evaluation architecture, the index can be promoted to evaluate any classification task, thereby implying broad application prospects.
Similar content being viewed by others
References
Dou Y, Qiao P, Jin R (2019) Exploring the defects of the average precision and its influence. Scientia Sinica Informationis 49(10):1369–1382
Sharma R, Goyal AK, Dwivedi RK (2016) A review of soft classification approaches on satellite image and accuracy assessment. In: Pant M, Deep K, Bansal J, Nagar A, Das K (eds) Proceedings of fifth international conference on soft computing for problem solving. Advances in intelligent systems and computing, vol 437. Springer, Singapore. https://doi.org/10.1007/978-981-10-0451-3_56
Erener A (2013) Classification method, spectral diversity, band combination and accuracy assessment evaluation for urban feature detection. Int J Classification Applied Earth Observation and Geoinformation 21:397–408
Persello C, Bruzzone L (2010) A novel protocol for accuracy assessment in classification of very high resolution images. Geoscience and Remote Sensing 48(3-1):1232–1244
Ding PLK, Li Y, Li B (2018) Mean local group average precision (mLGAP): a new performance metric for hashing-based retrieval. arXiv:1811.09763
He K, Lu Y, Sclaroff S (2018) Local descriptors optimized for average precision. 596–605. IEEE Computer Society
Yilmaz E, Aslam JA (2008) Estimating average precision when judgments are incomplete. Knowl Inf Syst 16(2):173–211
Henderson P, Ferrari V (2017) End-to-end training of object class detectors for mean average precision. In: Lai SH, Lepetit V, Nishino K, Sato Y (eds) Computer vision - ACCV 2016. ACCV 2016. Lecture notes in computer science, vol 10115. Springer, Cham. https://doi.org/10.1007/978-3-319-54193-8_13
Mao H, Yang X, Dally WJ (2019) A delay metric for video object detection: what average precision fails to tell. arXiv:1908.06368
Piwowarski B, Dupret G, Lalmas M (2012) Beyond cumulated gain and average precision: including willingness and expectation in the user model. arXiv:1209.4479
Andric K, Kalpic D, Bohacek Z (2019) An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment. Comput Sci Inf Syst 16(1):155–178
Bestgen Y (2015) Exact expected average precision of the random baseline for system evaluation. Prague Bull Math Linguistics 103:131–138
Revaud J, Almazan J, de Rezende RS, de Souza CR (2019) Learning with average precision: training image retrieval with a listwise loss [C]. The IEEE International Conference on Computer Vision (ICCV)
Yuan Y, Wanhua S, Mu Z (2015) Threshold-free measures for assessing the performance of medical screening tests[J]. Frontiers in Public Health. 3
Wang X (2001) Problem and improvement of R-values applied to assessment of earthquake forecast. China Earthquake Research 3(16):75–83
Dou AX, Wang XQ, Dou MW (2004) A new approach to evaluate the accuracy of image classification result R’ [C]. IEEE International Geoscience & Remote Sensing Symposium. IEEE
Omri A, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43(6):1223–1232
Bruzzone L, Persello C (2008) A novel protocol for accuracy assessment in classification of very high resolution multispectral and SAR images. 265–268. IEEE
Candela JQ, Dagan I (2006) Machine learning challenges, evaluating predictive uncertainty, visual object classification and recognizing textual entailment. First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005: Revised Selected Papers[J]. Springer
Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. The MIT Press
Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics) [M]. Springer-Verlag New York, Inc.
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms[M]. Cambridge University Press, Cambridge
Chen S, Fern A, Todorovic S (2015) Person count localization in videos from noisy foreground and detections. 1364–1372. IEEE Computer Society
Gajda J, Sroka R (2015) Design and accuracy assessment of the multi-sensor weigh-in-motion system [C]. Instrumentation & Measurement Technology Conference. 1036–1041. IEEE
Li W, Guo Q (2014) A new accuracy assessment method for one-class remote sensing classification. Geoscience and Remote Sensing 52(8):4621–4632
Simard F, Ayala D, Kamdem GC, Pombi M, Etouna J, Ose K, Fotsing JM, Fontenille D, Besansky NJ, Costantini C (2009) Ecological niche partitioning between Anopheles gambiae molecular forms in cameroon: the ecological side of speciation. Bmc Ecology 9(1):17
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognition Letters 30(1):27–38
Mosley L (2013) A balanced approach to the multi-class imbalance problem[D]
Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes[J]. IEEE Transactions on Knowledge & Data Engineering 21(10):1361–1371
Jiang L, Cai Z, Wang D (2012) Improving tree augmented Naive Bayes for class probability estimation[J]. Knowl-Based Syst 26:239–245
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning[J]. Knowl-Based Syst 174(JUN.15):137–143
Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion 54:128–144
Zhou L, Wang Q, Fujita H (2016) One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies[J]. Information Fusion 36:80–89
Zhou L, Fujita H (2017) Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification[M]. Elsevier Science Inc.
Zhou L, Tam KP, Fujita H (2016) Predicting the listing status of chinese listed companies with multi-class classification Models[J]. Inf Sci 328:222–236
Sun J, Lang J, Fujita H, Li H (2017) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates[J]. Information Sciences. S0020025517310083
Deng T, Ye D, Ma R, Fujita H, Xiong L (2020) Low-rank local tangent space embedding for subspace clustering[J]. Inf Sci 508:1–21
Acknowledgments
The work is sponsored by National Key Research and Development Program of China (2018YFB0204301), and Open Fund of PDL (6142110190201).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, K., Su, H. & Dou, Y. Beyond AP: a new evaluation index for multiclass classification task accuracy. Appl Intell 51, 7166–7176 (2021). https://doi.org/10.1007/s10489-021-02223-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02223-7