Skip to main content

ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis

  • Conference paper
  • First Online:
  • 3817 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

Data embedding (DE) and graph visualization (GV) methods are very compatible tools used in Exploratory Data Analysis for visualization of complex data such as high-dimensional data and complex networks. However, high computational complexity and memory load of existing DE and GV algorithms, considerably hinders visualization of truly large and big data consisting of as many as M~106+ data objects and N~103+ dimensions. Recently, we have shown that by employing only a small fraction of distances between data objects one can obtain very satisfactory reconstruction of topology of a complex data in 2D in a linear-time O(M). In this paper, we demonstrate the high robustness of our approach. We show that even poor approximations of the nn-nearst neighbor graph, representing high-dimensional data, can yield acceptable data embeddings. Furthermore, some incorrectness in the nearest neighbor list can often be useful to improve the quality of data visualization. This robustness of our DE method, together with its high memory and time efficiency, meets perfectly the requirements of big and distributed data visualization, when finding the accurate nearest neighbor list represents a great computational challenge.

13th International Conference on Machine Learning and Data Mining MLDM, New York, July 15-20, 2017.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web, pp. 287–297 (2016)

    Google Scholar 

  2. Johnson, W.P., Glenn, J.M.: Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd edn. (2014)

    Google Scholar 

  3. Pawliczek, P., Dzwinel, W., Yuen, D.A.: Visual exploration of data by using multidimensional scaling on multi-core CPU, GPU and MPI cluster. Concurrency Comput. Pract. Experience 26(3), 662–682 (2014)

    Google Scholar 

  4. van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  5. Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, pp. 833–840 (2002)

    Google Scholar 

  6. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2011)

    Google Scholar 

  7. van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)

    Google Scholar 

  8. Zhirong, Y., Peltonen, J., Kaski, S.: Optimization equivalence of divergences improves neighbor embedding. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)

    Google Scholar 

  9. Ingram, S., Munzner, T.: Dimensionality reduction for documents with nearest neighbor queries. Neurocomputing 150, 557–569 (2015)

    Google Scholar 

  10. Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann,E., Vilanova, A.: Hierarchical stochastic neighbor embedding. Comput. Graph. Forum. 35(3), 21–30 (2016)

    Google Scholar 

  11. Hu, Y., Lei, S.: Visualizing large graphs. Wiley Interdisc. Rev. Comput. Stat. 7(2), 15–136 (2015)

    Google Scholar 

  12. Dzwinel, W., Wcisło, R.: Very fast interactive visualization of large sets of high-dimensional data. Procedia Comput. Sci. 51, 572–581 (2015)

    Google Scholar 

  13. Dzwinel, W., Wcisło, R., Czech, W.: ivga: a fast force-directed method for interactive visualization of complex networks. J. Comput. Sci. (2016). in print, available on-line

    Google Scholar 

  14. Borcea, C., Streinu, I.: The number of embeddings of minimally rigid graphs. Discrete Comput. Geom. 31(2), 287–303 (2004)

    Google Scholar 

  15. Muja, M., David G.L.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Google Scholar 

  16. Lee, K.M.: Locality-sensitive hashing techniques for nearest neighbor search. Int. J. Fuzzy Logic Intell. Syst. 12(4), 300–307 (2012)

    Google Scholar 

  17. Dzwinel, W., Wcisło, R., Matwin, S.: ivhd: a fast and simple algorithm for embedding large and high-dimensional data, working version available (2017). www.researchgate.net, doi:10.13140/RG.2.2.28959.15520/1

  18. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the Polish National Center of Science (NCN) DEC-2013/09/B/ST6/01549.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafał Wcisło .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dzwine, W., Wcisło, R. (2017). ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62416-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62415-0

  • Online ISBN: 978-3-319-62416-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics