Tetrahedron: Barycentric Measure Visualizer

Brzezinski, Dariusz; Stefanowski, Jerzy; Susmaga, Robert; Szczȩch, Izabela

doi:10.1007/978-3-319-71273-4_43

Dariusz Brzezinski²²,
Jerzy Stefanowski²²,
Robert Susmaga²² &
…
Izabela Szczȩch²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10536))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3020 Accesses
2 Citations

Abstract

Each machine learning task comes equipped with its own set of performance measures. For example, there is a plethora of classification measures that assess predictive performance, a myriad of clustering indices, and equally many rule interestingness measures. Choosing the right measure requires careful thought, as it can influence model selection and thus the performance of the final machine learning system. However, analyzing and understanding measure properties is a difficult task. Here, we present Tetrahedron, a web-based visualization tool that aids the analysis of complete ranges of performance measures based on a two-by-two contingency matrix. The tool operates in a barycentric coordinate system using a 3D tetrahedron, which can be rotated, zoomed, cut, parameterized, and animated. The application is capable of visualizing predefined measures (86 currently), as well as helping prototype new measures by visualizing user-defined formulas.

You have full access to this open access chapter, Download conference paper PDF

ELMVIS+: Improved Nonlinear Visualization Technique Using Cosine Distance and Extreme Learning Machines

Relating instance hardness to classification performance in a dataset: a visual approach

Article 22 June 2022

EasySVM: A visual analysis approach for open-box support vector machines

Article Open access 15 March 2017

1 Introduction

Classifier selection and evaluation are difficult tasks requiring time and knowledge about the underlying data. One of the most important ingredients when assessing classifiers is the used classification performance measure. An analogous decision has to be made in association rule mining, where the overwhelming number of generated rules is usually trimmed by a selected interestingness measure. However, many researchers often carry out their experiments with respect to few selected measures, without discussing their properties and justifying their choice simply by the measure’s popularity.

To aid the analysis of properties of measures based on two-by-two contingency tables, we put forward Tetrahedron, a web-based visualization tool for analyzing entire ranges of measure values. The proposed application visualizes 4D data in 3D using the barycentric coordinate system [1, 2]. Tetrahedron produces 3D WebGL plots with zooming, rotating, animation, and detailed configuration capabilities. The presented tool can be used to compare properties of existing measures, as well as devise new metrics.

2 The Visualization Technique

A confusion matrix for binary classification (Table 1) consists of four entries: \( TP \), \( FP \), \( FN \), \( TN \). However, for a dataset of n examples these four entries are sum-constrained, as \(n = TP + FP + FN + TN \) . Therefore, for a given constant n, any three values in the confusion matrix uniquely define the fourth value. This property allows to visualize any classification performance measure based on the two-class confusion matrix using a 4D barycentric coordinate system, tailored to sum-constrained data. The same holds for any \(2\times 2\) matrix, for example, those used to define rule interestingness measures [2].

Table 1. Confusion matrix for two-class classification

Full size table

The barycentric coordinate system is a coordinate system in which point locations are specified relatively to hyper-sides of a simplex. A 4D barycentric coordinate system is a tetrahedron, where each dimension is represented as one of the four vertices. Choosing vectors that represent \( TP \), \( FP \), \( FN \), \( TN \) as vertices of a regular tetrahedron in a 3D space, one arrives at a barycentric coordinate system depicted in Fig. 1.

In this system, every confusion matrix \(\left[ {\begin{matrix} TP &{} FN \\ FP &{} TN \end{matrix}} \right] \) is represented as a point of the tetrahedron. Let us illustrate this fact with a few examples. Figure 1 shows a skeleton of a tetrahedron with four exemplary points:

one located in vertex \(\mathsf {TP}\), which represents \(\left[ {\begin{matrix} n &{} 0 \\ 0 &{} 0 \end{matrix}} \right] \),
one located in the middle of edge \(\mathsf {TP}\)–\(\mathsf {FP}\), which represents \(\left[ {\begin{matrix} n/2 &{} 0 \\ n/2 &{} 0 \end{matrix}} \right] \),
one located in the middle of face \(\triangle \mathsf {TP}\)–\(\mathsf {FP}\)–\(\mathsf {FN}\), which represents \(\left[ {\begin{matrix} n/3 &{} n/3 \\ n/3 &{} 0 \end{matrix}} \right] \),
one located in the middle of the tetrahedron, which represents \(\left[ {\begin{matrix} n/4 &{} n/4 \\ n/4 &{} n/4 \end{matrix}} \right] \).

One way of understanding this representation is to imagine a point in the tetrahedron as the center of mass of the examples in a confusion matrix. If all n examples are true positives, then the entire mass of the predictions is at \( TP \) and the point coincides with vertex \(\mathsf {TP}\). If all examples are false negatives, the point lies on vertex \(\mathsf {FN}\), etc. Generally, whenever \(a > b\) (\(a, b \in \{ TP , FN , FP , TN \}\)) then the point is closer to the vertex corresponding to a rather than b.

Using the barycentric coordinate system makes it possible to depict the originally 4D data (two-class confusion matrices) as points in 3D. Moreover, an additional variable based on the depicted four values may be rendered as color. In the presented tool, we adapt this procedure to color-code the values of classification performance and rule interestingness measures. A more in-depth description of the visualization and its possible applications can be found in [1, 2].

3 Tool Overview

The described visualization technique has been implemented as an interactive web-based application. An online version, compatible with all modern web browsers across different client platforms, is publicly available^{Footnote 1}. The application can visualize 86 predefined 4D measures, including 21 classification measures, 16 rule interestingness measures, and 49 general-purpose formulas based on a two-by-two matrix. The user can also visualize custom measures by providing their formula. The main functionalities of the application are:

Interactive 3D tetrahedron visualization. The visualization (Fig. 2a) supports: 86 predefined measures, rotating, zooming, four rendering precisions, saving as an html with WebGL, and exporting images. The user may choose to visualize external views, inner layers, and control point-padding.
Cross-sections. A useful way of visualizing measure values can also be achieved be cutting the tetrahedron with a plane and analyzing the obtained slice. In this application the user can visualize cross-sections (Fig. 2b) which correspond to different class distributions. Interestingly, this particular kind of cross-sections produces a 2D space analogous to that used in ROC charts.
Parameter animations. Several of the application options can be animated. These options can change the visualization parameters automatically in constant intervals creating an animation (Fig. 2c). Such animations can be useful when attempting to analyze: consecutive layers of the tetrahedron, the impact of measure parameters (e.g. the impact of \(\beta \) in F\(_\beta \)-score), or the effect of changing class distributions on cross-sections.
Custom measure definition. It is possible to define a custom measure to be visualized by providing its formula (Fig. 2d).

Since classification accuracy is one of the most intuitive performance measures, let us use it to exemplify visualizations produced by our tool with the default (blue: 0, red: 1) color map. One can notice that confusion matrices with a high number of \( FP \) and \( FN \) result in low accuracy (blue), whereas high \( TP \) and \( TN \) yield high accuracy (red). Cross-sections for two different class ratios show that on imbalanced data high accuracy can be achieved by trivial majority classifiers. More examples of visual-based analyses can be found in [1, 2] (Fig. 3).

4 Conclusions

We propose Tetrahedron, a web-based visualization tool for analyzing and prototyping measures based on a two-by-two matrix. Its main features include: interactive 3D WebGL barycentric plots, zooming, parameter animation, performing cross-sections, providing custom measure formulas, and saving plots with a single click. Such functionality facilitates visual inspection of various measure properties, such as determining measure monotonicity, symmetries, maximas, or undefined values. Thus, the presented tool can be used to gain further understanding of existing machine learning measures, as well as devise new ones.

Notes

1.
https://dabrze.shinyapps.io/Tetrahedron/. Source codes at: https://github.com/dabrze/tetrahedron (MIT License).

References

Brzezinski, D., Stefanowski, J., Susmaga, R., Szczȩch, I.: Visual-based analysis of classification measures with applications to imbalanced data. arXiv:1704.07122
Susmaga, R., Szczȩch, I.: Can interestingness measures be usefully visualized? Int. J. Appl. Math. Comp. Sci. 25(2), 323–336 (2015)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

NCN DEC-2013/11/B/ST6/00963, PUT Statutory Funds.

Author information

Authors and Affiliations

Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 2, 60-965, Poznan, Poland
Dariusz Brzezinski, Jerzy Stefanowski, Robert Susmaga & Izabela Szczȩch

Authors

Dariusz Brzezinski
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Stefanowski
View author publications
You can also search for this author in PubMed Google Scholar
Robert Susmaga
View author publications
You can also search for this author in PubMed Google Scholar
Izabela Szczȩch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dariusz Brzezinski .

Editor information

Editors and Affiliations

Google Research, Google Inc., Zurich, Switzerland
Yasemin Altun
NASA Ames Research Center, Mountain View, USA
Kamalika Das
Oath, Sunnyvale, USA
Taneli Mielikäinen
Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
Donato Malerba
Institute of Computing Science, Poznan University of Technology, Poznan, Poland
Jerzy Stefanowski
Laboratoire d’ Informatique (LIX), École Polytechnique, Palaiseau, France
Jesse Read
Department of Computer Science, Stanford University, Stanford, USA
Marinka Žitnik
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Jožef Stefan Institute, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brzezinski, D., Stefanowski, J., Susmaga, R., Szczȩch, I. (2017). Tetrahedron: Barycentric Measure Visualizer. In: Altun, Y., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-71273-4_43
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71272-7
Online ISBN: 978-3-319-71273-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tetrahedron: Barycentric Measure Visualizer

Abstract

Similar content being viewed by others

ELMVIS+: Improved Nonlinear Visualization Technique Using Cosine Distance and Extreme Learning Machines

Relating instance hardness to classification performance in a dataset: a visual approach

EasySVM: A visual analysis approach for open-box support vector machines

1 Introduction

2 The Visualization Technique

3 Tool Overview

4 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Tetrahedron: Barycentric Measure Visualizer

Abstract

Similar content being viewed by others

ELMVIS+: Improved Nonlinear Visualization Technique Using Cosine Distance and Extreme Learning Machines

Relating instance hardness to classification performance in a dataset: a visual approach

EasySVM: A visual analysis approach for open-box support vector machines

1 Introduction

2 The Visualization Technique

3 Tool Overview

4 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation