A Novel Software Tool for Fast Multiview Visualization of High-Dimensional Datasets

Zhang, Luying; Tian, Hui; Shen, Hong

doi:10.1007/978-3-031-42430-4_25

Luying Zhang¹²,
Hui Tian¹³ &
Hong Shen^14,15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1863))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

467 Accesses

Abstract

Scatterplot is a popular technique for visualizing high-dimensional datasets by using linear and nonlinear dimension reduction methods. These methods map the original high-dimensional dataset onto scatterplot points directly by dimension reduction, and hence require a high computation cost. Despite many improvements in scatterplot visual effects, however, when the data volume is large, the data mapped onto scatterplot data points will overlap, resulting a low quality of visualization. In this paper, we propose a novel software tool that ensembles five integrated components for fast multiview visualization of high-dimensional datasets: sampling, dimension reduction, clustering, multiview collaborative analysis, and dimension re-arrangement. In our tool, while the sampling component reduces the sizes of the datasets applying the random sampling technique to gain a high visualization efficiency, dimension reduction reduces the dimensions of the datasets applying principal-component analysis to improve the visualization quality. Next, clustering discovers hidden information in the reduced dataset applying fuzzy c-mean clustering to display hidden patterns of the original datasets. Finally, multiview collaborative analysis enables users to analyse multidimensional datasets from different aspects at the same time by combining scatterplot and scatterplot matrices. To optimize the visualization effects, in the scatterplot matrices, we re-arrange their dimensions and adjust the positions of scatterplots so that similar scatterplot points are adjacent in positions. As the result, in comparison with the existing visualization tools that apply some of these techniques, our tool not only improves the efficiency of dimension reduction but also enhances the quality of visualization and enables more comprehensive analysis. We test our tool on different real datasets to demonstrate its effectiveness. The experimental results validate that our method is effective in both efficiency and quality of visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using Hybrid Scatterplots for Visualizing Multi-dimensional Data

High-Dimensional Data Visualization Based on User Knowledge

Scatterplot selection for dimensionality reduction in multidimensional data visualization

Article 23 August 2024

References

Ameur, K., Benblidia, N., Oukid-Khouas, S.: Enhanced visual clustering by reordering of dimensions in parallel coordinates. In: 2013 International Conference on IT Convergence and Security (ICITCS), pp. 1–4. IEEE (2013)
Google Scholar
Artero, A.O., de Oliveira, M.C.F., Levkowitz, H.: Uncovering clusters in crowded parallel coordinates visualizations. In: INFOVIS 2004. IEEE Symposium on Information Visualization 2004, pp. 81–88. IEEE (2004)
Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: VISA: visual subspace clustering analysis. ACM SIGKDD Explor. Newsl. 9(2), 5–12 (2007)
Article Google Scholar
Bezdek, J.C.: Models for pattern recognition. In: Bezdek, J.C. (ed.) Pattern Recognition with Fuzzy Objective Function Algorithms, pp. 1–13. Springer, Boston (1981). https://doi.org/10.1007/978-1-4757-0450-1_1
Chapter MATH Google Scholar
Bickel, P.J., Freedman, D.A.: Asymptotic normality and the bootstrap in stratified sampling. Ann. Stat. 12(2), 470–482 (1984)
Article MathSciNet MATH Google Scholar
Binh, H.T.T., Van Long, T., Hoai, N.X., Anh, N.D., Truong, P.M.: Reordering dimensions for radial visualization of multidimensional data-a genetic algorithms approach. In: 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 951–958. IEEE (2014)
Google Scholar
Carr, D.B., Littlefield, R.J., Nicholson, W.L., Littlefield, J.S.: Scatterplot matrix techniques for large $N$. J. Am. Stat. Assoc. 82(398), 424–436 (1987)
MathSciNet Google Scholar
Chambers, J.M.: Graphical Methods for Data Analysis (1983)
Google Scholar
Dash, B., Mishra, D., Rath, A., Acharya, M.: A hybridized K-means clustering approach for high dimensional dataset. Int. J. Eng. Sci. Technol. 2(2), 59–66 (2010)
Article Google Scholar
Deng, Z., et al.: Compass: towards better causal analysis of urban time series. IEEE Trans. Vis. Comput. Graph. 28(1), 1051–1061 (2022)
Article MathSciNet Google Scholar
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)
Article MathSciNet MATH Google Scholar
Errington, J.R., Kofke, D.A.: Calculation of surface tension via area sampling. J. Chem. Phys. 127(17), 174709 (2007)
Article Google Scholar
Gundersen, H.J.G., Jensen, E.B.V., Kieu, K., Nielsen, J.: The efficiency of systematic sampling in stereology-reconsidered. J. Microsc. 193(3), 199–211 (1999)
Article Google Scholar
Itoh, T., Takakura, H., Sawada, A., Koyamada, K.: Hierarchical visualization of network intrusion detection data. IEEE Comput. Graph. Appl. 26(2), 40–47 (2006)
Article Google Scholar
Law, M.H.C., Zhang, N., Jain, A.K.: Nonlinear manifold learning for data stream. In: SDM, pp. 33–44. SIAM (2004)
Google Scholar
Liu, H., Sadygov, R.G., Yates, J.R.: A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76(14), 4193–4201 (2004)
Article Google Scholar
Lu, L.F., Huang, M.L., Huang, T.-H.: A new axes re-ordering method in parallel coordinates visualization. In: 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 252–257. IEEE (2012)
Google Scholar
Megill, C., et al.: Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. bioRxiv (2021)
Google Scholar
Musdholifah, A., Hashim, S.Z.M., Ngah, R.: Hybrid PCA-ILGC clustering approach for high dimensional data. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 420–424 (2012)
Google Scholar
Probst, D., Reymond, J.-L.: FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web. Bioinformatics 34(8), 1433–1435 (2017)
Article Google Scholar
Probst, D., Reymond, J.-L.: Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminformatics 12(1), 1–13 (2020). https://doi.org/10.1186/s13321-020-0416-x
Article Google Scholar
Rajput, D.S., Singh, P.K., Bhattacharya, M.: Feature selection with efficient initialization of clusters centers for high dimensional data clustering. In: 2011 International Conference on Communication Systems and Network Technologies (CSNT), pp. 293–297 (2011)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Soivio, A., Nynolm, K., Westman, K.: A technique for repeated sampling of the blood of individual resting fish. J. Exp. Biol. 63(1), 207–217 (1975)
Article Google Scholar
Tajunisha, N., Saravanan, V.: An increased performance of clustering high dimensional data using Principal Component Analysis. In: 2010 First International Conference on Integrated Intelligent Computing (ICIIC), pp. 17–21 (2010)
Google Scholar
Tatu, A., et al.: Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. In: 2012 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 63–72. IEEE (2012)
Google Scholar
Wang, J., Cai, X., Jiajie, S., Liao, Yu., Yingcai, W.: What makes a scatterplot hard to comprehend: data size and pattern salience matter. J. Vis. 25(1), 59–75 (2022). https://doi.org/10.1007/s12650-021-00778-8
Article Google Scholar
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1), 37–52 (1987)
Article Google Scholar
Zheng, Y., Suematsu, H., Itoh, T., Fujimaki, R., Morinaga, S., Kawahara, Y.: Scatterplot layout for high-dimensional data visualization. J. Vis. 18(1), 111–119 (2015). https://doi.org/10.1007/s12650-014-0230-5
Article Google Scholar
Zhou, F., Huang, W., Li, J., Huang, Y., Shi, Y., Zhao, Y.: Extending dimensions in Radviz based on mean shift. In: 2015 IEEE Pacific Visualization Symposium (PacificVis), pp. 111–115. IEEE (2015)
Google Scholar
Zhou, H., Xu, P., Ming, Z., Qu, H.: Parallel coordinates with data labels. In: Proceedings of the 7th International Symposium on Visual Information Communication and Interaction, p. 49. ACM (2014)
Google Scholar
Zhou, Y., Chalapathi, N., Rathore, A., Zhao, Y., Wang, B.: Mapper interactive: a scalable, extendable, and interactive toolbox for the visual exploration of high-dimensional data. In: 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pp. 101–110 (2021)
Google Scholar
Zhu, H., et al.: Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs. Vis. Inform. 5(2), 51–59 (2021)
Article Google Scholar

Download references

Acknowledgement

This work is supported by Macao Polytechnic University Research Grant RP/FCA-13/2022. The corresponding author is Hong Shen.

Author information

Authors and Affiliations

School of Computer Science, Beijing Jiaotong University, Beijing, China
Luying Zhang
School of Information and Communication Technology, Griffith University, Gold Coast, Australia
Hui Tian
Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
Hong Shen
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Hong Shen

Authors

Luying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Tian
View author publications
You can also search for this author in PubMed Google Scholar
Hong Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Shen .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Siridech Boonsang
Iwate Prefectural University, Iwate, Japan
Hamido Fujita
Wrocław University of Science and Technology, Wrocław, Poland
Bogumiła Hnatkowska
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
King Mongkut's Institute of Technology, Ladkrabang, Thailand
Kitsuchart Pasupa
Malaysia Japan International Institute of Technology, Kuala Lumpur, Malaysia
Ali Selamat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Tian, H., Shen, H. (2023). A Novel Software Tool for Fast Multiview Visualization of High-Dimensional Datasets. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-42430-4_25
Published: 29 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Novel Software Tool for Fast Multiview Visualization of High-Dimensional Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using Hybrid Scatterplots for Visualizing Multi-dimensional Data

High-Dimensional Data Visualization Based on User Knowledge

Scatterplot selection for dimensionality reduction in multidimensional data visualization

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Novel Software Tool for Fast Multiview Visualization of High-Dimensional Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using Hybrid Scatterplots for Visualizing Multi-dimensional Data

High-Dimensional Data Visualization Based on User Knowledge

Scatterplot selection for dimensionality reduction in multidimensional data visualization

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation