Stability Analysis of Supervised Decision Boundary Maps

Oliveira, Artur A. A. M.; Espadoto, Mateus; Hirata Jr., Roberto; Telea, Alexandru C.

doi:10.1007/s42979-022-01662-4

Stability Analysis of Supervised Decision Boundary Maps

Original Research
Published: 21 February 2023

Volume 4, article number 226, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Artur A. A. M. Oliveira¹^na1,
Mateus Espadoto ORCID: orcid.org/0000-0002-1922-4309¹^na1,
Roberto Hirata Jr.¹^na1 &
…
Alexandru C. Telea²^na1

125 Accesses
Explore all metrics

Abstract

Understanding how a machine learning classifier works is an important task in machine learning engineering. However, doing this is for any classifier in general difficult. We propose to leverage visualization methods for this task. For this, we extend a recent technique called Decision Boundary Map (DBM) which graphically depicts how a classifier partitions its input data space into decision zones separated by decision boundaries. We use a supervised, GPU-accelerated technique that computes bidirectional mappings between the data and projection spaces to solve several shortcomings of DBM, such as accuracy and speed. We present several experiments that show that SDBM generates results which are easier to interpret, far less prone to noise, and compute significantly faster than DBM, while maintaining the genericity and ease of use of DBM for any type of single-output classifier. We also show, in addition to earlier work, that SDBM is stable with respect to various types and amounts of changes of the training set used to construct the visualized classifiers. This property was, to our knowledge, not investigated for any comparable method for visualizing classifier decision maps, and is essential for the deployment of such visualization methods in analyzing real-world classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A survey on semi-supervised learning

Article Open access 15 November 2019

Data Availability

Not applicable.

Code Availability

Our implementation, plus all code used in our experiments, are publicly available at github.com/mespadoto/sdbm.

References

Ribeiro MT, Singh S, Guestrin C. Why should i trust you?: Explaining the predictions of any classifier. In: Proc. ACM SIGMOD KDD. 2016. p. 1135–1144.
Garcia R, Telea A, da Silva B, Torresen J, Comba J. A task-and-technique centered survey on visual analytics for deep learning model engineering. Comput Gr. 2018;77:30–49.
Article Google Scholar
Lundberg S.M, Lee S.-I. A unified approach to interpreting model predictions. In: Proc. NIPS. 2017. p. 4768–4777.
Nóbrega C, Marinho L. Towards explaining recommendations through local surrogate models. In: Proc. ACM/SIGAPP symp. on applied computing. 2019. p. 1671–1678.
Rauber PE, Falcao AX, Telea AC. Projections as visual aids for classification system design. Inf Vis. 2017;17(4):282–305.
Article Google Scholar
Rauber PE, Fadel SG, Falcao AX, Telea AC. Visualizing the hidden activity of artificial neural networks. IEEE TVCG. 2017;23(1):101–10.
Google Scholar
Rodrigues F, Espadoto M, Hirata R, Telea AC. Constructing and visualizing high-quality classifier decision boundary maps. Information. 2019;10(9):280.
Article Google Scholar
Nonato L, Aupetit M. Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE TVCG. 2018. https://doi.org/10.1109/TVCG.2018.2846735.
Article Google Scholar
Oliveira A.A.M, Espadoto M, Hirata R, Telea A. SDBM: supervised decision boundary maps for machine learning classifiers. In: Proc. IVAPP. 2022. p. 77–87.
Rodrigues FCM, Hirata R, Telea AC. Image-based visualization of classifier decision boundaries. In: Proc. IEEE conf. on graphics, patterns and images (SIBGRAPI). 2018. p. 353–360.
Espadoto M, Rodrigues FCM, Telea AC. Visual analytics of multidimensional projections for constructing classifier decision boundary maps. In: Proc. IVAPP. SCITEPRESS. 2019. p. 132–144.
Cox DR. The regression analysis of binary sequences. J R Stat Soc Ser B (Methodological). 1958;20(2):215–32.
MathSciNet MATH Google Scholar
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Article MATH Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Article MATH Google Scholar
Amorim E, Brazil EV, Daniels J, Joia P, Nonato L.G, Sousa MC. iLAMP: exploring high-dimensional spacing through backward multidimensional projection. In: Proc. IEEE VAST. 2012. p. 53–62.
Maaten LVD, Hinton G. Visualizing data using t-SNE. JMLR. 2008;9:2579–605.
MATH Google Scholar
McInnes L, Healy J. UMAP: uniform manifold approximation and projection for dimension reduction. 2018. arXiv:1802.03426v1 [stat.ML].
Espadoto M, Rodrigues FCM, Hirata NST, Hirata Jr. R, Telea AC. Deep learning inverse multidimensional projections. In: Proc. EuroVA. Eurographics. 2019.
Espadoto M, Rodrigues FCM, Hirata N, Telea A. OptMap: using dense maps for visualizing multidimensional optimization problems. In: Proc. IVAPP. SciTePress. 2021.
Collaris D, van Wijk JJ. StrategyAtlas: strategy analysis for machine learning interpretability. IEEE TVCG. 2022. https://doi.org/10.1109/TVCG.2022.3146806.
Shepard D. A two-dimensional interpolation function for irregularly-spaced data. In: Proc. ACM national conference. 1968. p. 517–524.
Aupetit M. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing. 2007;10(7):1304–30.
Article Google Scholar
Martins R, Coimbra D, Minghim R, Telea A. Visual analysis of dimensionality reduction quality for parameterized projections. Comput Gr. 2014;41:26–42.
Article Google Scholar
Tian Z, Zhai X, van Driel D, van Steenpaal G, Espadoto M, Telea A. Using multiple attribute-based explanations of multidimensional projections to explore high-dimensional data. Comput Gr. 2021;98:93–104.
Article Google Scholar
Venna J, Kaski S. Visualizing gene interaction graphs with local multidimensional scaling. In: Proc. ESANN. 2006. p. 557–562.
Seifert C, Sabol V, Kienreich W. Stress maps: analysing local phenomena in dimensionality reduction based visualisations. In: Proc. IEEE VAST. 2010.
Joia P, Coimbra D, Cuminato JA, Paulovich FV, Nonato LG. Local affine multidimensional projection. IEEE TVCG. 2011;17(12):2563–71.
Google Scholar
Espadoto M, Martins RM, Kerren A, Hirata NS, Telea AC. Toward a quantitative survey of dimension reduction techniques. IEEE TVCG. 2019;27(3):2153–73.
Google Scholar
Vernier E, Garcia R, Silva I.d, Comba J, Telea A. Quantitative evaluation of time-dependent multidimensional projection techniques. In: Proc. EuroVis. 2020.
Bredius C, Tian Z, Telea A. Visual exploration of neural network projection stability. In: Proc. MLVis. Eurographics. 2022.
Espadoto M, Hirata NST, Telea AC. Deep learning multidimensional projections. Inf Vis. 2020;19(3):247–69.
Article Google Scholar
Espadoto M, Falcao A, Hirata N, Telea A. Improving neural network-based multidimensional projections. In: Proc. IVAPP. 2020.
Hoffman P, Grinstein G. A survey of visualizations for high-dimensional data mining. Inf Vis Data Min Knowl Discov. 2002;104:47–82.
Google Scholar
Maaten LVD, Postma E. Dimensionality reduction: a comparative review. Technical report, Tilburg University, Netherlands (2009)
Engel D, Hattenberger L, Hamann B. A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: Proc. IRTG Workshop, vol. 27. Schloss Dagstuhl. 2012. p. 135–149.
Sorzano C, Vargas J, Pascual-Montano A. A survey of dimensionality reduction techniques. 2014. arXiv:1403.2877 [stat.ML].
Liu S, Maljovec D, Wang B, Bremer P-T, Pascucci V. Visualizing high-dimensional data: advances in the past decade. IEEE TVCG. 2015;23(3):1249–68.
Google Scholar
Cunningham J, Ghahramani Z. Linear dimensionality reduction: survey, insights, and generalizations. JMLR. 2015;16:2859–900.
MathSciNet MATH Google Scholar
Xie H, Li J, Xue H. A survey of dimensionality reduction techniques based on random projection. 2017. arXiv:1706.04371 [cs.LG].
Jolliffe IT. Principal component analysis and factor analysis. In: Principal component analysis. Springer. 1986. p. 115–128.
Torgerson WS. Theory and methods of scaling. Oxford: Wiley; 1958.
Google Scholar
Tenenbaum JB, Silva VD, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290(5500):2319–23.
Article Google Scholar
Roweis ST, Saul LLK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–6.
Article Google Scholar
Wattenberg M. How to use t-SNE effectively. https://distill.pub/2016/misread-tsne. 2016.
Maaten LVD. Learning a parametric embedding by preserving local structure. In: Proc. AI-STATS. 2009.
Maaten LVD. Accelerating t-SNE using tree-based algorithms. JMLR. 2014;15:3221–45.
MathSciNet MATH Google Scholar
Pezzotti N, Höllt T, Lelieveldt B, Eisemann E, Vilanova A. Hierarchical stochastic neighbor embedding. Comput Gr Forum. 2016;35(3):21–30.
Article Google Scholar
Pezzotti N, Lelieveldt B, Maaten LVD, Höllt T, Eisemann E, Vilanova A. Approximated and user steerable t-SNE for progressive visual analytics. IEEE TVCG. 2017;23:1739–52.
Google Scholar
Pezzotti N, Thijssen J, Mordvintsev A, Hollt T, Lew BV, Lelieveldt B, Eisemann E, Vilanova A. GPGPU linear complexity t-SNE optimization. IEEE TVCG. 2020;26(1):1172–81.
Google Scholar
Chan D, Rao R, Huang F, Canny J. T-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. In: Proc. SBAC-PAD. 2018. p. 330–338.
Modrakowski TS, Espadoto M, Falcão AX, Hirata NST, Telea A. Improving deep learning projections by neighborhood analysis. Berlin: Springer; 2020.
Google Scholar
Espadoto M, Hirata NS, Telea AC. Self-supervised dimensionality reduction with neural networks and pseudo-labeling. In: Proc. IVAPP. SCITEPRESS. 2021. p. 27–37.
Hunter JD. Matplotlib: a 2d graphics environment. Comput Sci Eng. 2007;9(3):90–5.
Article Google Scholar
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747. 2017.
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz J.L. Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: Proc. intl. workshop on ambient assisted living. Springer. 2012. p. 216–223.
LeCun Y, Cortes C. MNIST handwritten digits dataset. 2010. http://yann.lecun.com/exdb/mnist.
Thoma M. The reuters dataset. 2017. https://martin-thoma.com/nlp-reuters.
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1986.
MATH Google Scholar
Kruskal JB. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika. 1964;29(1):1–27.
Article MathSciNet MATH Google Scholar
Paulovich FV, Silva CT, Nonato LG. Two-phase mapping for projecting massive datasets. IEEE TVCG. 2010;16(6):1281–90.
Google Scholar
Paulovich FV, Minghim R. Text map explorer: a tool to create and explore document maps. In: Proc. intl. conference on information visualisation (IV). IEEE. 2006. p. 245–251.
Vernier EF, Comba J, Telea A. Quantitative comparison of dynamic treemaps for software evolution visualization. In: Proc. IEEE VISSOFT. 2018.
Vernier E, Sondag M, Comba J, Speckmann B, Telea A, Verbeek K. Quantitative comparison of time-dependent treemaps. Comput Gr Forum. 2020;39(3):393–404.
Article Google Scholar
The Authors: SDBM Implementation. 2021. https://github.com/mespadoto/sdbm.
Chollet F. Keras. 2015. https://keras.io
Rahaman M, Li C, Yao Y, Kulwa F, Rahman MA, Wang Q, Qi S, Kong F, Zhu X, Zhao X. Identification of COVID-19 samples from chest X-ray images using deep learning: a comparison of transfer learning approaches. J X-Ray Sci Technol. 2020;28(5):821–39.
Google Scholar
Chen H, Li C, Wang G, Li X, Rahaman M, Sun H, Hu W, Li Y, Liu W, Sun C, Ai S, Grzegorzek M. GasHis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recogn. 2022;130: 108827.
Article Google Scholar
Liu W, Li C, Xu N, Jiang T, Rahaman M, Sun H, Wu X, Hu W, Chen H, Sun C, Yao Y, Grzegorzek M. CVM-Cervix: a hybrid cervical Pap-smear image classification framework using CNN, visual transformer and multilayer perceptron. Pattern Recogn. 2022;130: 108829.
Article Google Scholar
Zhang J, Li C, Kosov S, Grzegorzek M, Shirahamad K, Jiang T, Sun C, Li Z, Li H. LCU-Net: a novel low-cost U-Net for environmental microorganism image segmentation. Pattern Recogn. 2021;115: 107885.
Article Google Scholar
Rahaman M, Li C, Yao Y, Kulwa F, Wu X, Li X, Wang Q. DeepCervix: a deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Comput Biol Med. 2021;136: 104649.
Article Google Scholar
Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S. Global sensitivity analysis: the primer. New York: Wiley; 2008.
MATH Google Scholar

Download references

Funding

This study was financed in part by FAPESP grants 2015/22308-2, 2017/25835-9 and 2020/13275-1, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.

Author information

Artur A. A. M. Oliveira, Mateus Espadoto, Roberto Hirata Jr., Alexandru C. Telea have contributed equally to this work.

Authors and Affiliations

Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, São Paulo, 05508-090, Brazil
Artur A. A. M. Oliveira, Mateus Espadoto & Roberto Hirata Jr.
Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Alexandru C. Telea

Authors

Artur A. A. M. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Mateus Espadoto
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Hirata Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru C. Telea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mateus Espadoto.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Computer Vision, Imaging and Computer Graphics Theory and Applications” guest edited by Kadi Bouatouch, Augusto Sousa, Mounia Ziat and Helen Purchase.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Oliveira, A.A.A.M., Espadoto, M., Hirata Jr., R. et al. Stability Analysis of Supervised Decision Boundary Maps. SN COMPUT. SCI. 4, 226 (2023). https://doi.org/10.1007/s42979-022-01662-4

Download citation

Received: 20 June 2022
Accepted: 30 December 2022
Published: 21 February 2023
DOI: https://doi.org/10.1007/s42979-022-01662-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stability Analysis of Supervised Decision Boundary Maps

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on semi-supervised learning

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stability Analysis of Supervised Decision Boundary Maps

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on semi-supervised learning

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation