Relating instance hardness to classification performance in a dataset: a visual approach

Paiva, Pedro Yuri Arbs; Moreno, Camila Castro; Smith-Miles, Kate; Valeriano, Maria Gabriela; Lorena, Ana Carolina

doi:10.1007/s10994-022-06205-9

Relating instance hardness to classification performance in a dataset: a visual approach

Published: 22 June 2022

Volume 111, pages 3085–3123, (2022)
Cite this article

Machine Learning Aims and scope Submit manuscript

Pedro Yuri Arbs Paiva ORCID: orcid.org/0000-0001-9768-7563¹,
Camila Castro Moreno^1,2,
Kate Smith-Miles³,
Maria Gabriela Valeriano^1,2 &
…
Ana Carolina Lorena¹

4550 Accesses
1 Altmetric
Explore all metrics

Abstract

Machine Learning studies often involve a series of computational experiments in which the predictive performance of multiple models are compared across one or more datasets. The results obtained are usually summarized through average statistics, either in numeric tables or simple plots. Such approaches fail to reveal interesting subtleties about algorithmic performance, including which observations an algorithm may find easy or hard to classify, and also which observations within a dataset may present unique challenges. Recently, a methodology known as Instance Space Analysis was proposed for visualizing algorithm performance across different datasets. This methodology relates predictive performance to estimated instance hardness measures extracted from the datasets. However, the analysis considered an instance as being an entire classification dataset and the algorithm performance was reported for each dataset as an average error across all observations in the dataset. In this paper, we developed a more fine-grained analysis by adapting the ISA methodology. The adapted version of ISA allows the analysis of an individual classification dataset by a 2-D hardness embedding, which provides a visualization of the data according to the difficulty level of its individual observations. This allows deeper analyses of the relationships between instance hardness and predictive performance of classifiers. We also provide an open-access Python package named PyHard, which encapsulates the adapted ISA and provides an interactive visualization interface. We illustrate through case studies how our tool can provide insights about data quality and algorithm performance in the presence of challenges such as noisy and biased data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instance spaces for machine learning classification

Article 28 December 2017

Measuring Instance Hardness Using Data Complexity Measures

Characterizing Multiple Instance Datasets

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and material

The benchmark datasets and outputs of their analysis are available at our open repository under the fold ‘experiments’ (https://gitlab.com/ita-ml/pyhard). The real-data COVID dataset had the features names uncharacterized for anonymization.

Notes

https://pypi.org/project/pyhard/.
https://matilda.unimelb.edu.au/matilda/.
https://pypi.org/project/pyispace/.
For binary classification problems, CL and CLD as defined by Equations (11) and (12) are the same, as $\max _{y_j \ne y_i}[P({\mathbf {x}}_i |y_j)P(y_j)] = 1 - P({\mathbf {x}}_i|y_i)P(y_i)$ in this case.
Available in the Python rankaggregation package.
https://pypi.org/project/pyispace/.
Recalling that IH takes the average likelihood of misclassification of an instance ${\mathbf {x}}_i$ by the multiple algorithms in the pool ${\mathcal {A}}$, the threshold on IH cannot be based on Proposition 1.
https://pypi.org/project/pyhard/.
The benchmark datasets and outputs of their analysis are available at the shared url https://gitlab.com/ita-ml/pyhard. The real-data COVID dataset in the repository had the features names uncharacterized for anonymization.
https://www.openml.org/d/42192.
Online tool for Instance Space Analysis, available at https://www.matilda.unimelb.edu.au.

References

Arruda, J. L., Prudêncio, R. B., & Lorena, A. C. (2020). Measuring instance hardness using data complexity measures. In Brazilian Conference on Intelligent Systems, Springer, pp 483–497.
Barek, M. A., Aziz, M. A., & Islam, M. S. (2020). Impact of age, sex, comorbidities and clinical symptoms on the severity of covid-19 cases: A meta-analysis with 55 studies and 10014 cases. Heliyon, 6(12), e05684.
Article Google Scholar
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’11, p 2546–2554.
Bergstra, J., Yamins, D., & Cox, D. D. (2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on International Conference on Machine Learning - Volume 28, p I–115–I–123.
Böken, B. (2021). On the appropriateness of platt scaling in classifier calibration. Information Systems, 95, 101641.
Article Google Scholar
Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023
Edelsbrunner, H. (2010). Alpha shapes–a survey. Tessellations in the Sciences, 27, 1–25.
MathSciNet Google Scholar
Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P., Roth, D. (2019). A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pp 329–338.
Gao, S., Ver Steeg, G., & Galstyan, A. (2015). Efficient estimation of mutual information for strongly dependent variables. In International Conference on Artificial Intelligence and Statistics (AISTATS), pp 277–286.
Garcia, L. P., de Carvalho, A. C., & Lorena, A. C. (2015). Effect of label noise in the complexity of classification problems. Neurocomputing, 160, 108–119.
Article Google Scholar
Giraud-Carrier, C., & Provost, F. (2005). Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper. In Proc. ICML-2005 Workshop on Meta-learning, pp 12–19.
Hajian, S., Bonchi, F., & Castillo, C. (2016). Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2125–2126.
Hillinger, C. (2004). Voting and the cardinal aggregation of judgments. SSRN 548662
Kandanaarachchi, S., Muñoz, M. A., Hyndman, R. J., & Smith-Miles, K. (2020). On normalization and algorithm selection for unsupervised outlier detection. Data Mining and Knowledge Discovery, 34(2), 309–354.
Article MathSciNet Google Scholar
Kang, Y., Hyndman, R. J., & Smith-Miles, K. (2017). Visualising forecasting algorithm performance using time series instance spaces. International Journal of Forecasting, 33(2), 345–358.
Article Google Scholar
Khademi, A., & Honavar, V. (2020). Algorithmic bias in recidivism prediction: A causal perspective (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 34(10), 13839–13840.
Article Google Scholar
Khan, K., Rehman, S. U., Aziz, K., Fong, S., & Sarasvady, S. (2014). Dbscan: Past, present and future. In The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), IEEE, pp 232–238.
Kletzander, L., Musliu, N., & Smith-Miles, K. (2021). Instance space analysis for a personnel scheduling problem. Annals of Mathematics and Artificial Intelligence, 89, 617–637.
Article MathSciNet Google Scholar
Leyva, E., González, A., & Pérez, R. (2014). A set of complexity measures designed for applying meta-learning to instance selection. IEEE Transactions on Knowledge and Data Engineering, 27(2), 354–367.
Article Google Scholar
Leyva, E., González, A., & Pérez, R. (2015). Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recognition, 48(4), 1523–1537.
Article Google Scholar
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature selection: A data perspective. ACM Computing Surveys, 50(6), 1–45.
Article Google Scholar
Maletic, J. I., & Marcus, A. (2000). Data cleansing: Beyond integrity analysis. In Iq, pp 200–209.
Muñoz, M. A., Villanova, L., Baatar, D., & Smith-Miles, K. (2018). Instance spaces for machine learning classification. Machine Learning, 107(1), 109–147.
Article MathSciNet Google Scholar
Muñoz, M. A., & Smith-Miles, K. A. (2017). Performance analysis of continuous black-box optimization algorithms via footprints in instance space. Evolutionary computation, 25(4), 529–554.
Article Google Scholar
Muñoz, M. A., Yan, T., Leal, M. R., Smith-Miles, K., Lorena, A. C., Pappa, G. L., & Rodrigues, R. M. (2021). An instance space analysis of regression problems. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(2), 1–25.
Article Google Scholar
Platt, J., et al. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61–74.
Google Scholar
Prati, R. C. (2012). Combining feature ranking algorithms through rank aggregation. In: The 2012 International joint conference on neural networks (IJCNN), IEEE, pp 1–8.
Rice, J. R. (1976). The algorithm selection problem. Advances in Computers (Vol. 15, pp. 65–118). Elsevier.
Google Scholar
Rudin, C., Wang, C., & Coker, B. (2020). The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review. https://doi.org/10.1162/99608f92.6ed64b30.
Article Google Scholar
Sani, H. M., Lei, C., & Neagu, D. (2018). Computational complexity analysis of decision tree algorithms. In: International conference on innovative techniques and applications of artificial intelligence, Springer, pp 191–197.
Smith, M. R., Martinez, T., & Giraud-Carrier, C. (2014). An instance level analysis of data complexity. Machine Learning, 95(2), 225–256.
Article MathSciNet Google Scholar
Smith-Miles, K., & Bowly, S. (2015). Generating new test instances by evolving in instance space. Computers & Operations Research, 63, 102–113.
Article MathSciNet Google Scholar
Smith-Miles, K., & Lopes, L. (2011). Generalising algorithm performance in instance space: A timetabling case study. In: International conference on learning and intelligent optimization, Springer, pp 524–538.
Smith-Miles, K., & Tan, T. T. (2012). Measuring algorithm footprints in instance space. In: 2012 IEEE congress on evolutionary computation, IEEE, pp 1–8.
Smith-Miles, K., Baatar, D., Wreford, B., & Lewis, R. (2014). Towards objective measures of algorithm performance across instance space. Computers and Operations Research, 45, 12–24.
Article MathSciNet Google Scholar
Smith-Miles, K., Christiansen, J., & Muñoz, M. A. (2021). Revisiting where are the hard knapsack problems? via instance space analysis. Computers & Operations Research, 128, 105184.
Article MathSciNet Google Scholar
Smith-Miles, K. A. (2009). Cross-Disciplinary Perspectives on Meta-Learning for Algorithm Selection. ACM Computing Surveys, 41(1), 1–25.
Article Google Scholar
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In: Proceedings. 25th international conference on neural information processing systems - Volume 2, Curran Associates Inc., Red Hook, NY, USA, p 2951–2959.
Vanschoren, J. (2019). Meta-learning. In Automated Machine Learning, Springer, pp 35–61.
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77–95.
Article Google Scholar
Wolpert, D. H. (2002). The Supervised Learning No-Free-Lunch Theorems (pp. 25–42). Springer London.
Google Scholar
Yarrow, S., Razak, K. A., Seitz, A. R., & Seriès, P. (2014). Detecting and quantifying topography in neural maps. PloS one, 9(2), e87178.
Article Google Scholar
Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial intelligence review, 22(3), 177–210.
Article Google Scholar

Download references

Acknowledgements

The authors are thankful to the São José dos Campos municipal health secretariat and IPPLAN (Instituto de Pesquisa e Planejamento) for providing data on COVID cases.

Funding

This work was partially supported by the Brazilian research agencies Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 (main grant 88887.507037/2020-00), CNPq (grant 307892/2020-4) and FAPESP (grant 2021/06870-3) and by the Australian Research Council (grant FL140100012).

Author information

Authors and Affiliations

Instituto Tecnológico de Aeronáutica (ITA), São José dos Campos, São Paulo, Brazil
Pedro Yuri Arbs Paiva, Camila Castro Moreno, Maria Gabriela Valeriano & Ana Carolina Lorena
Universidade Federal de São Paulo (Unifesp), São José dos Campos, São Paulo, Brazil
Camila Castro Moreno & Maria Gabriela Valeriano
School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia
Kate Smith-Miles

Authors

Pedro Yuri Arbs Paiva
View author publications
You can also search for this author inPubMed Google Scholar
Camila Castro Moreno
View author publications
You can also search for this author inPubMed Google Scholar
Kate Smith-Miles
View author publications
You can also search for this author inPubMed Google Scholar
Maria Gabriela Valeriano
View author publications
You can also search for this author inPubMed Google Scholar
Ana Carolina Lorena
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

P. Y. A. Paiva implemented PyHard and PyIspace, along with all the parts of the framework dedicated to the ISA analysis of a single dataset and has run experiments with the label noise datasets. C. C. Moreno has run and performed the analysis of the COMPAS dataset. K. Smith-Miles has contributed with paper organization, with the validation of the experiments and is the proponent of the original ISA framework. M. G. Valeriano has built, run and performed the analysis of the Covid dataset. A. C. Lorena proposed framing the ISA for the analysis of a single dataset and supervised all the work. All authors contributed with paper writing and organization.

Corresponding author

Correspondence to Pedro Yuri Arbs Paiva.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

The source code of the cited Python packages used in the experiments section is available at the public repositories PyHard (https://gitlab.com/ita-ml/pyhard) and PyISpace (https://gitlab.com/ita-ml/pyispace).

Ethical approval

The anonymized Covid dataset was obtained as part of a partnership between the project “Data science for fighting outbreaks, epidemics and pandemics in hospitals” coordinated by researcher A. C. Lorena and the São José dos Campos health department.

Consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Additional information

Editor: Salvador Garcia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proof of proposition

Proposition 1

(cross-entropy bounds). For any classification problem with C classes there is a lower bound $L_{\mathrm {lower}}$ and an upper bound $L_{\mathrm {upper}}$ for the cross-entropy loss (aka log-loss) such that: if $\mathrm {logloss}({\mathbf {x}}_i) < L_{lower}$, the prediction was correct; if $\mathrm {logloss}({\mathbf {x}}_i) > L_{upper}$, the prediction was incorrect; and if $L_{\mathrm {lower}} \le {\mathrm {logloss}}({\mathbf {x}}_i) \le L_{\mathrm {upper}}$, the prediction can be either correct or incorrect , where ${\mathrm {logloss}}({\mathbf {x}}_i)$ is the log-loss of instance ${\mathbf {x}}_i$. Specifically, these bounds can be set as $L_{\mathrm {lower}}=-\log \big (\frac{1}{2}\big ){=\log 2}$ and $L_{\mathrm {upper}}=-\log \big (\frac{1}{C}\big )={\log C}$.

Proof

Given a multiclass setting consisting of C classes, the outcome of a classifier is the predicted probability vector $[p_1, p_2, \dots , p_{C}]$, and the predicted class is defined as $\mathop {\hbox {arg max}}\limits _j p_j$. And $y_{j, c} :=I_{j=c}$ indicates the true class c.

We first prove that if the classifier succeeds in correctly predicting the class of instance ${\mathbf {x}}_i$, then ${\mathrm {logloss}}({\mathbf {x}}_i) < L_{\mathrm {upper}}$ and that the value of $L_{\mathrm {upper}}$ is $-\log (\frac{1}{C})$. Without loss of generality, suppose $j=1$ is the correct class, since the classes can be always reordered so that each one of them becomes the first in the set. In that case, $\mathop {\hbox {arg max}}\limits _j p_j = 1$, which implies that:

$$\begin{aligned} p_1&> p_2\\ p_1&> p_3\\&\vdots \\ p_1&> p_{C} \end{aligned}$$

Summing all those inequalities results in

$$\begin{aligned} (C-1) p_1 > p_2 + \cdots + p_{C} \end{aligned}$$

On the other hand, $\sum _j p_j = 1$. Therefore,

$$\begin{aligned} (C-1) p_1&> 1 - p_1\\ p_1&> \frac{1}{C}\\ \log p_1&> \log \frac{1}{C}\\ - \log p_1&< - \log \frac{1}{C}\\ - y_{1, c} \log p_1&< - \log \frac{1}{C}\\ \therefore {\mathrm {logloss}}({\mathbf {x}}_i)&< - \log \frac{1}{C} = L_{\mathrm {upper}}\\ \end{aligned}$$

However, if ${\mathrm {logloss}}({\mathbf {x}}_i) < L_{\mathrm {upper}}$ it does not necessarily imply that the instance ${\mathbf {x}}_i$ was correctly classified. We show this by counterexample: take the particular predicted probability vector $\big [\frac{1}{2} - \varepsilon , \frac{1}{2} + \varepsilon , 0, \dots , 0\big ]$, and define the right class as $c=1$. For this vector, the classifier predicts the class 2, since it has the highest probability. The log-loss value is

$$\begin{aligned} -\sum _{j=1}^{C} y_{j, c} \log p_j = -y_{1, c} \log p_1 = -\log \Big (\frac{1}{2} - \varepsilon \Big ) \end{aligned}$$

If we choose $0< \varepsilon < \frac{1}{2} - \frac{1}{C}$, which is always possible for $C \ge 3$, then

$$\begin{aligned} {\mathrm {logloss}}({\mathbf {x}}_i)&= -\log \Big (\frac{1}{2} - \varepsilon \Big )\\&< -\log \Big (\frac{1}{2} - \frac{1}{2} + \frac{1}{C} \Big )\\&< -\log \frac{1}{C} = L_{\mathrm {upper}} \end{aligned}$$

Specifically for binary problems with only two classes, if ${\mathrm {logloss}}({\mathbf {x}}_i) < - \log \frac{1}{2}$, then

$$\begin{aligned} - y_{1, c} \log p_1 < - \log \frac{1}{2} \implies p_1> \frac{1}{2} > p_2 \implies \mathop {\hbox {arg max}}\limits _j p_j = 1 \end{aligned}$$

So, in the particular case of binary classification problems, $L_{\mathrm {lower}} = L_{\mathrm {upper}}$. Thus,

$$\begin{aligned} \mathop {\hbox {arg max}}\limits _j p_j = 1 \iff {\mathrm {logloss}}({\mathbf {x}}_i) < -\log \frac{1}{2} \quad (\text {binary class problems}) \end{aligned}$$

To find $L_{\mathrm {lower}}$, we first show that a multiclass problem can be reduced to a binary classification problem in the sense of log loss metric. Herewith, the vector $[p_1, p_2, \dots , p_n]$ is equivalent to $[p_1, p_2']$, with $p_2' = \sum _{j=2}^{C} p_j$. In both cases, the log loss value is the same. Thus, we assume $L_{\mathrm {lower}} = -\log \frac{1}{2}$ and prove it by contradiction.

Assume that ${\mathrm {logloss}}({\mathbf {x}}_i) < -\log \frac{1}{2}$, that the correct class is $c=1$ and $\exists p_k : p_k > p_1$ (classification error). Then,

$$\begin{aligned}&-\log p_1< -\log \frac{1}{2}\\&\implies \frac{1}{2}< p_1< p_k\\&\implies 1 < p_1 + p_k \quad (\mathrm {absurd!}) \end{aligned}$$

Therefore, $L_{\mathrm {lower}} = -\log \frac{1}{2}$. $\square$

Appendix B Additional figures

Additional figures from the analysis of the COMPAS dataset are presented here. They show the Lasso selections of easy and hard instances (Fig. 14) and distributions of some other attributes besides race, namely number of priors (Fig. 15a), age (Fig. 15b) and sex (Fig. 15c). Median values are represented by vertical dashed lines. A summary of these results is presented and discussed in Sect. 4.2 using Table 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paiva, P.Y.A., Moreno, C.C., Smith-Miles, K. et al. Relating instance hardness to classification performance in a dataset: a visual approach. Mach Learn 111, 3085–3123 (2022). https://doi.org/10.1007/s10994-022-06205-9

Download citation

Received: 05 September 2021
Revised: 03 April 2022
Accepted: 26 May 2022
Published: 22 June 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10994-022-06205-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relating instance hardness to classification performance in a dataset: a visual approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instance spaces for machine learning classification

Measuring Instance Hardness Using Data Complexity Measures

Characterizing Multiple Instance Datasets

Explore related subjects

Availability of data and material

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A Proof of proposition

Proposition 1

Proof

Appendix B Additional figures

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now