Abstract
Understanding how a machine learning classifier works is an important task in machine learning engineering. However, doing this is for any classifier in general difficult. We propose to leverage visualization methods for this task. For this, we extend a recent technique called Decision Boundary Map (DBM) which graphically depicts how a classifier partitions its input data space into decision zones separated by decision boundaries. We use a supervised, GPU-accelerated technique that computes bidirectional mappings between the data and projection spaces to solve several shortcomings of DBM, such as accuracy and speed. We present several experiments that show that SDBM generates results which are easier to interpret, far less prone to noise, and compute significantly faster than DBM, while maintaining the genericity and ease of use of DBM for any type of single-output classifier. We also show, in addition to earlier work, that SDBM is stable with respect to various types and amounts of changes of the training set used to construct the visualized classifiers. This property was, to our knowledge, not investigated for any comparable method for visualizing classifier decision maps, and is essential for the deployment of such visualization methods in analyzing real-world classification models.
Similar content being viewed by others
Data Availability
Not applicable.
Code Availability
Our implementation, plus all code used in our experiments, are publicly available at github.com/mespadoto/sdbm.
References
Ribeiro MT, Singh S, Guestrin C. Why should i trust you?: Explaining the predictions of any classifier. In: Proc. ACM SIGMOD KDD. 2016. p. 1135–1144.
Garcia R, Telea A, da Silva B, Torresen J, Comba J. A task-and-technique centered survey on visual analytics for deep learning model engineering. Comput Gr. 2018;77:30–49.
Lundberg S.M, Lee S.-I. A unified approach to interpreting model predictions. In: Proc. NIPS. 2017. p. 4768–4777.
Nóbrega C, Marinho L. Towards explaining recommendations through local surrogate models. In: Proc. ACM/SIGAPP symp. on applied computing. 2019. p. 1671–1678.
Rauber PE, Falcao AX, Telea AC. Projections as visual aids for classification system design. Inf Vis. 2017;17(4):282–305.
Rauber PE, Fadel SG, Falcao AX, Telea AC. Visualizing the hidden activity of artificial neural networks. IEEE TVCG. 2017;23(1):101–10.
Rodrigues F, Espadoto M, Hirata R, Telea AC. Constructing and visualizing high-quality classifier decision boundary maps. Information. 2019;10(9):280.
Nonato L, Aupetit M. Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE TVCG. 2018. https://doi.org/10.1109/TVCG.2018.2846735.
Oliveira A.A.M, Espadoto M, Hirata R, Telea A. SDBM: supervised decision boundary maps for machine learning classifiers. In: Proc. IVAPP. 2022. p. 77–87.
Rodrigues FCM, Hirata R, Telea AC. Image-based visualization of classifier decision boundaries. In: Proc. IEEE conf. on graphics, patterns and images (SIBGRAPI). 2018. p. 353–360.
Espadoto M, Rodrigues FCM, Telea AC. Visual analytics of multidimensional projections for constructing classifier decision boundary maps. In: Proc. IVAPP. SCITEPRESS. 2019. p. 132–144.
Cox DR. The regression analysis of binary sequences. J R Stat Soc Ser B (Methodological). 1958;20(2):215–32.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Amorim E, Brazil EV, Daniels J, Joia P, Nonato L.G, Sousa MC. iLAMP: exploring high-dimensional spacing through backward multidimensional projection. In: Proc. IEEE VAST. 2012. p. 53–62.
Maaten LVD, Hinton G. Visualizing data using t-SNE. JMLR. 2008;9:2579–605.
McInnes L, Healy J. UMAP: uniform manifold approximation and projection for dimension reduction. 2018. arXiv:1802.03426v1 [stat.ML].
Espadoto M, Rodrigues FCM, Hirata NST, Hirata Jr. R, Telea AC. Deep learning inverse multidimensional projections. In: Proc. EuroVA. Eurographics. 2019.
Espadoto M, Rodrigues FCM, Hirata N, Telea A. OptMap: using dense maps for visualizing multidimensional optimization problems. In: Proc. IVAPP. SciTePress. 2021.
Collaris D, van Wijk JJ. StrategyAtlas: strategy analysis for machine learning interpretability. IEEE TVCG. 2022. https://doi.org/10.1109/TVCG.2022.3146806.
Shepard D. A two-dimensional interpolation function for irregularly-spaced data. In: Proc. ACM national conference. 1968. p. 517–524.
Aupetit M. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing. 2007;10(7):1304–30.
Martins R, Coimbra D, Minghim R, Telea A. Visual analysis of dimensionality reduction quality for parameterized projections. Comput Gr. 2014;41:26–42.
Tian Z, Zhai X, van Driel D, van Steenpaal G, Espadoto M, Telea A. Using multiple attribute-based explanations of multidimensional projections to explore high-dimensional data. Comput Gr. 2021;98:93–104.
Venna J, Kaski S. Visualizing gene interaction graphs with local multidimensional scaling. In: Proc. ESANN. 2006. p. 557–562.
Seifert C, Sabol V, Kienreich W. Stress maps: analysing local phenomena in dimensionality reduction based visualisations. In: Proc. IEEE VAST. 2010.
Joia P, Coimbra D, Cuminato JA, Paulovich FV, Nonato LG. Local affine multidimensional projection. IEEE TVCG. 2011;17(12):2563–71.
Espadoto M, Martins RM, Kerren A, Hirata NS, Telea AC. Toward a quantitative survey of dimension reduction techniques. IEEE TVCG. 2019;27(3):2153–73.
Vernier E, Garcia R, Silva I.d, Comba J, Telea A. Quantitative evaluation of time-dependent multidimensional projection techniques. In: Proc. EuroVis. 2020.
Bredius C, Tian Z, Telea A. Visual exploration of neural network projection stability. In: Proc. MLVis. Eurographics. 2022.
Espadoto M, Hirata NST, Telea AC. Deep learning multidimensional projections. Inf Vis. 2020;19(3):247–69.
Espadoto M, Falcao A, Hirata N, Telea A. Improving neural network-based multidimensional projections. In: Proc. IVAPP. 2020.
Hoffman P, Grinstein G. A survey of visualizations for high-dimensional data mining. Inf Vis Data Min Knowl Discov. 2002;104:47–82.
Maaten LVD, Postma E. Dimensionality reduction: a comparative review. Technical report, Tilburg University, Netherlands (2009)
Engel D, Hattenberger L, Hamann B. A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: Proc. IRTG Workshop, vol. 27. Schloss Dagstuhl. 2012. p. 135–149.
Sorzano C, Vargas J, Pascual-Montano A. A survey of dimensionality reduction techniques. 2014. arXiv:1403.2877 [stat.ML].
Liu S, Maljovec D, Wang B, Bremer P-T, Pascucci V. Visualizing high-dimensional data: advances in the past decade. IEEE TVCG. 2015;23(3):1249–68.
Cunningham J, Ghahramani Z. Linear dimensionality reduction: survey, insights, and generalizations. JMLR. 2015;16:2859–900.
Xie H, Li J, Xue H. A survey of dimensionality reduction techniques based on random projection. 2017. arXiv:1706.04371 [cs.LG].
Jolliffe IT. Principal component analysis and factor analysis. In: Principal component analysis. Springer. 1986. p. 115–128.
Torgerson WS. Theory and methods of scaling. Oxford: Wiley; 1958.
Tenenbaum JB, Silva VD, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290(5500):2319–23.
Roweis ST, Saul LLK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–6.
Wattenberg M. How to use t-SNE effectively. https://distill.pub/2016/misread-tsne. 2016.
Maaten LVD. Learning a parametric embedding by preserving local structure. In: Proc. AI-STATS. 2009.
Maaten LVD. Accelerating t-SNE using tree-based algorithms. JMLR. 2014;15:3221–45.
Pezzotti N, Höllt T, Lelieveldt B, Eisemann E, Vilanova A. Hierarchical stochastic neighbor embedding. Comput Gr Forum. 2016;35(3):21–30.
Pezzotti N, Lelieveldt B, Maaten LVD, Höllt T, Eisemann E, Vilanova A. Approximated and user steerable t-SNE for progressive visual analytics. IEEE TVCG. 2017;23:1739–52.
Pezzotti N, Thijssen J, Mordvintsev A, Hollt T, Lew BV, Lelieveldt B, Eisemann E, Vilanova A. GPGPU linear complexity t-SNE optimization. IEEE TVCG. 2020;26(1):1172–81.
Chan D, Rao R, Huang F, Canny J. T-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. In: Proc. SBAC-PAD. 2018. p. 330–338.
Modrakowski TS, Espadoto M, Falcão AX, Hirata NST, Telea A. Improving deep learning projections by neighborhood analysis. Berlin: Springer; 2020.
Espadoto M, Hirata NS, Telea AC. Self-supervised dimensionality reduction with neural networks and pseudo-labeling. In: Proc. IVAPP. SCITEPRESS. 2021. p. 27–37.
Hunter JD. Matplotlib: a 2d graphics environment. Comput Sci Eng. 2007;9(3):90–5.
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747. 2017.
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz J.L. Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: Proc. intl. workshop on ambient assisted living. Springer. 2012. p. 216–223.
LeCun Y, Cortes C. MNIST handwritten digits dataset. 2010. http://yann.lecun.com/exdb/mnist.
Thoma M. The reuters dataset. 2017. https://martin-thoma.com/nlp-reuters.
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1986.
Kruskal JB. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika. 1964;29(1):1–27.
Paulovich FV, Silva CT, Nonato LG. Two-phase mapping for projecting massive datasets. IEEE TVCG. 2010;16(6):1281–90.
Paulovich FV, Minghim R. Text map explorer: a tool to create and explore document maps. In: Proc. intl. conference on information visualisation (IV). IEEE. 2006. p. 245–251.
Vernier EF, Comba J, Telea A. Quantitative comparison of dynamic treemaps for software evolution visualization. In: Proc. IEEE VISSOFT. 2018.
Vernier E, Sondag M, Comba J, Speckmann B, Telea A, Verbeek K. Quantitative comparison of time-dependent treemaps. Comput Gr Forum. 2020;39(3):393–404.
The Authors: SDBM Implementation. 2021. https://github.com/mespadoto/sdbm.
Chollet F. Keras. 2015. https://keras.io
Rahaman M, Li C, Yao Y, Kulwa F, Rahman MA, Wang Q, Qi S, Kong F, Zhu X, Zhao X. Identification of COVID-19 samples from chest X-ray images using deep learning: a comparison of transfer learning approaches. J X-Ray Sci Technol. 2020;28(5):821–39.
Chen H, Li C, Wang G, Li X, Rahaman M, Sun H, Hu W, Li Y, Liu W, Sun C, Ai S, Grzegorzek M. GasHis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recogn. 2022;130: 108827.
Liu W, Li C, Xu N, Jiang T, Rahaman M, Sun H, Wu X, Hu W, Chen H, Sun C, Yao Y, Grzegorzek M. CVM-Cervix: a hybrid cervical Pap-smear image classification framework using CNN, visual transformer and multilayer perceptron. Pattern Recogn. 2022;130: 108829.
Zhang J, Li C, Kosov S, Grzegorzek M, Shirahamad K, Jiang T, Sun C, Li Z, Li H. LCU-Net: a novel low-cost U-Net for environmental microorganism image segmentation. Pattern Recogn. 2021;115: 107885.
Rahaman M, Li C, Yao Y, Kulwa F, Wu X, Li X, Wang Q. DeepCervix: a deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Comput Biol Med. 2021;136: 104649.
Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S. Global sensitivity analysis: the primer. New York: Wiley; 2008.
Funding
This study was financed in part by FAPESP grants 2015/22308-2, 2017/25835-9 and 2020/13275-1, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances on Computer Vision, Imaging and Computer Graphics Theory and Applications” guest edited by Kadi Bouatouch, Augusto Sousa, Mounia Ziat and Helen Purchase.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Oliveira, A.A.A.M., Espadoto, M., Hirata Jr., R. et al. Stability Analysis of Supervised Decision Boundary Maps. SN COMPUT. SCI. 4, 226 (2023). https://doi.org/10.1007/s42979-022-01662-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01662-4