Abstract
Scatterplot is a popular technique for visualizing high-dimensional datasets by using linear and nonlinear dimension reduction methods. These methods map the original high-dimensional dataset onto scatterplot points directly by dimension reduction, and hence require a high computation cost. Despite many improvements in scatterplot visual effects, however, when the data volume is large, the data mapped onto scatterplot data points will overlap, resulting a low quality of visualization. In this paper, we propose a novel software tool that ensembles five integrated components for fast multiview visualization of high-dimensional datasets: sampling, dimension reduction, clustering, multiview collaborative analysis, and dimension re-arrangement. In our tool, while the sampling component reduces the sizes of the datasets applying the random sampling technique to gain a high visualization efficiency, dimension reduction reduces the dimensions of the datasets applying principal-component analysis to improve the visualization quality. Next, clustering discovers hidden information in the reduced dataset applying fuzzy c-mean clustering to display hidden patterns of the original datasets. Finally, multiview collaborative analysis enables users to analyse multidimensional datasets from different aspects at the same time by combining scatterplot and scatterplot matrices. To optimize the visualization effects, in the scatterplot matrices, we re-arrange their dimensions and adjust the positions of scatterplots so that similar scatterplot points are adjacent in positions. As the result, in comparison with the existing visualization tools that apply some of these techniques, our tool not only improves the efficiency of dimension reduction but also enhances the quality of visualization and enables more comprehensive analysis. We test our tool on different real datasets to demonstrate its effectiveness. The experimental results validate that our method is effective in both efficiency and quality of visualization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ameur, K., Benblidia, N., Oukid-Khouas, S.: Enhanced visual clustering by reordering of dimensions in parallel coordinates. In: 2013 International Conference on IT Convergence and Security (ICITCS), pp. 1–4. IEEE (2013)
Artero, A.O., de Oliveira, M.C.F., Levkowitz, H.: Uncovering clusters in crowded parallel coordinates visualizations. In: INFOVIS 2004. IEEE Symposium on Information Visualization 2004, pp. 81–88. IEEE (2004)
Assent, I., Krieger, R., Müller, E., Seidl, T.: VISA: visual subspace clustering analysis. ACM SIGKDD Explor. Newsl. 9(2), 5–12 (2007)
Bezdek, J.C.: Models for pattern recognition. In: Bezdek, J.C. (ed.) Pattern Recognition with Fuzzy Objective Function Algorithms, pp. 1–13. Springer, Boston (1981). https://doi.org/10.1007/978-1-4757-0450-1_1
Bickel, P.J., Freedman, D.A.: Asymptotic normality and the bootstrap in stratified sampling. Ann. Stat. 12(2), 470–482 (1984)
Binh, H.T.T., Van Long, T., Hoai, N.X., Anh, N.D., Truong, P.M.: Reordering dimensions for radial visualization of multidimensional data-a genetic algorithms approach. In: 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 951–958. IEEE (2014)
Carr, D.B., Littlefield, R.J., Nicholson, W.L., Littlefield, J.S.: Scatterplot matrix techniques for large \(N\). J. Am. Stat. Assoc. 82(398), 424–436 (1987)
Chambers, J.M.: Graphical Methods for Data Analysis (1983)
Dash, B., Mishra, D., Rath, A., Acharya, M.: A hybridized K-means clustering approach for high dimensional dataset. Int. J. Eng. Sci. Technol. 2(2), 59–66 (2010)
Deng, Z., et al.: Compass: towards better causal analysis of urban time series. IEEE Trans. Vis. Comput. Graph. 28(1), 1051–1061 (2022)
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)
Errington, J.R., Kofke, D.A.: Calculation of surface tension via area sampling. J. Chem. Phys. 127(17), 174709 (2007)
Gundersen, H.J.G., Jensen, E.B.V., Kieu, K., Nielsen, J.: The efficiency of systematic sampling in stereology-reconsidered. J. Microsc. 193(3), 199–211 (1999)
Itoh, T., Takakura, H., Sawada, A., Koyamada, K.: Hierarchical visualization of network intrusion detection data. IEEE Comput. Graph. Appl. 26(2), 40–47 (2006)
Law, M.H.C., Zhang, N., Jain, A.K.: Nonlinear manifold learning for data stream. In: SDM, pp. 33–44. SIAM (2004)
Liu, H., Sadygov, R.G., Yates, J.R.: A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76(14), 4193–4201 (2004)
Lu, L.F., Huang, M.L., Huang, T.-H.: A new axes re-ordering method in parallel coordinates visualization. In: 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 252–257. IEEE (2012)
Megill, C., et al.: Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. bioRxiv (2021)
Musdholifah, A., Hashim, S.Z.M., Ngah, R.: Hybrid PCA-ILGC clustering approach for high dimensional data. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 420–424 (2012)
Probst, D., Reymond, J.-L.: FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web. Bioinformatics 34(8), 1433–1435 (2017)
Probst, D., Reymond, J.-L.: Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminformatics 12(1), 1–13 (2020). https://doi.org/10.1186/s13321-020-0416-x
Rajput, D.S., Singh, P.K., Bhattacharya, M.: Feature selection with efficient initialization of clusters centers for high dimensional data clustering. In: 2011 International Conference on Communication Systems and Network Technologies (CSNT), pp. 293–297 (2011)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Soivio, A., Nynolm, K., Westman, K.: A technique for repeated sampling of the blood of individual resting fish. J. Exp. Biol. 63(1), 207–217 (1975)
Tajunisha, N., Saravanan, V.: An increased performance of clustering high dimensional data using Principal Component Analysis. In: 2010 First International Conference on Integrated Intelligent Computing (ICIIC), pp. 17–21 (2010)
Tatu, A., et al.: Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. In: 2012 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 63–72. IEEE (2012)
Wang, J., Cai, X., Jiajie, S., Liao, Yu., Yingcai, W.: What makes a scatterplot hard to comprehend: data size and pattern salience matter. J. Vis. 25(1), 59–75 (2022). https://doi.org/10.1007/s12650-021-00778-8
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1), 37–52 (1987)
Zheng, Y., Suematsu, H., Itoh, T., Fujimaki, R., Morinaga, S., Kawahara, Y.: Scatterplot layout for high-dimensional data visualization. J. Vis. 18(1), 111–119 (2015). https://doi.org/10.1007/s12650-014-0230-5
Zhou, F., Huang, W., Li, J., Huang, Y., Shi, Y., Zhao, Y.: Extending dimensions in Radviz based on mean shift. In: 2015 IEEE Pacific Visualization Symposium (PacificVis), pp. 111–115. IEEE (2015)
Zhou, H., Xu, P., Ming, Z., Qu, H.: Parallel coordinates with data labels. In: Proceedings of the 7th International Symposium on Visual Information Communication and Interaction, p. 49. ACM (2014)
Zhou, Y., Chalapathi, N., Rathore, A., Zhao, Y., Wang, B.: Mapper interactive: a scalable, extendable, and interactive toolbox for the visual exploration of high-dimensional data. In: 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pp. 101–110 (2021)
Zhu, H., et al.: Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs. Vis. Inform. 5(2), 51–59 (2021)
Acknowledgement
This work is supported by Macao Polytechnic University Research Grant RP/FCA-13/2022. The corresponding author is Hong Shen.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Tian, H., Shen, H. (2023). A Novel Software Tool for Fast Multiview Visualization of High-Dimensional Datasets. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-42430-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)