ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis

Dzwine, Witold; Wcisło, Rafał

doi:10.1007/978-3-319-62416-7_25

ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis

Witold Dzwine¹⁴ &
Rafał Wcisło¹⁴

Conference paper
First Online: 02 July 2017

3817 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

Data embedding (DE) and graph visualization (GV) methods are very compatible tools used in Exploratory Data Analysis for visualization of complex data such as high-dimensional data and complex networks. However, high computational complexity and memory load of existing DE and GV algorithms, considerably hinders visualization of truly large and big data consisting of as many as M~10⁶⁺ data objects and N~10³⁺ dimensions. Recently, we have shown that by employing only a small fraction of distances between data objects one can obtain very satisfactory reconstruction of topology of a complex data in 2D in a linear-time O(M). In this paper, we demonstrate the high robustness of our approach. We show that even poor approximations of the nn-nearst neighbor graph, representing high-dimensional data, can yield acceptable data embeddings. Furthermore, some incorrectness in the nearest neighbor list can often be useful to improve the quality of data visualization. This robustness of our DE method, together with its high memory and time efficiency, meets perfectly the requirements of big and distributed data visualization, when finding the accurate nearest neighbor list represents a great computational challenge.

13^th International Conference on Machine Learning and Data Mining MLDM, New York, July 15-20, 2017.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web, pp. 287–297 (2016)
Google Scholar
Johnson, W.P., Glenn, J.M.: Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd edn. (2014)
Google Scholar
Pawliczek, P., Dzwinel, W., Yuen, D.A.: Visual exploration of data by using multidimensional scaling on multi-core CPU, GPU and MPI cluster. Concurrency Comput. Pract. Experience 26(3), 662–682 (2014)
Google Scholar
van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Google Scholar
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, pp. 833–840 (2002)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2011)
Google Scholar
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)
Google Scholar
Zhirong, Y., Peltonen, J., Kaski, S.: Optimization equivalence of divergences improves neighbor embedding. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)
Google Scholar
Ingram, S., Munzner, T.: Dimensionality reduction for documents with nearest neighbor queries. Neurocomputing 150, 557–569 (2015)
Google Scholar
Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann,E., Vilanova, A.: Hierarchical stochastic neighbor embedding. Comput. Graph. Forum. 35(3), 21–30 (2016)
Google Scholar
Hu, Y., Lei, S.: Visualizing large graphs. Wiley Interdisc. Rev. Comput. Stat. 7(2), 15–136 (2015)
Google Scholar
Dzwinel, W., Wcisło, R.: Very fast interactive visualization of large sets of high-dimensional data. Procedia Comput. Sci. 51, 572–581 (2015)
Google Scholar
Dzwinel, W., Wcisło, R., Czech, W.: ivga: a fast force-directed method for interactive visualization of complex networks. J. Comput. Sci. (2016). in print, available on-line
Google Scholar
Borcea, C., Streinu, I.: The number of embeddings of minimally rigid graphs. Discrete Comput. Geom. 31(2), 287–303 (2004)
Google Scholar
Muja, M., David G.L.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Google Scholar
Lee, K.M.: Locality-sensitive hashing techniques for nearest neighbor search. Int. J. Fuzzy Logic Intell. Syst. 12(4), 300–307 (2012)
Google Scholar
Dzwinel, W., Wcisło, R., Matwin, S.: ivhd: a fast and simple algorithm for embedding large and high-dimensional data, working version available (2017). www.researchgate.net, doi:10.13140/RG.2.2.28959.15520/1
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)
Google Scholar

Download references

Acknowledgments

This research is supported by the Polish National Center of Science (NCN) DEC-2013/09/B/ST6/01549.

Author information

Authors and Affiliations

AGH University of Science and Technology, Kraków, Poland
Witold Dzwine & Rafał Wcisło

Authors

Witold Dzwine
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Wcisło
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Wcisło .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dzwine, W., Wcisło, R. (2017). ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-62416-7_25
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics