Skip to main content

Abstract

Data becomes too big to see. Yet visualization is a central way people understand data. We need to learn new ways to accommodate data visualization that scales up and out for large data to enable people to explore visually their data interactively in real-time as a means to understanding it. The five V’s of big data—value, volume, variety, velocity, and veracity—each highlights the challenges of this endeavor.

We present these challenges and a system, Skydive, that we are developing to meet them. Skydive presents an approach that tightly couples a database back-end with a visualization front-end for scaling up and out. We show how hierarchical aggregation can be used to drive this, and the powerful types of interactive visual presentations that can be supported. We are preparing for the day soon when visualization becomes the sixth V of big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This includes U.S.A. with 319 million, Mexico with 122 million, and Canada with 35 million, as of 2013.

  2. 2.

    Though they are cognizant of the need, and are working toward addressing this.

  3. 3.

    We shall also show ways that categorical data as measures can be accommodated.

  4. 4.

    We use the same number of divisions—power of two—along each of the dimensions, without loss of generality. It is trivial to allow for different “aspect” ratios with different numbers of divisions for different dimensions, however.

  5. 5.

    For simplicity, we shall refer to strata \(s_0\), ..., \(s_l\), from the top to the bottom, respectively, forgoing the minus sign when understood in context.

  6. 6.

    At least not standard versions of these.

  7. 7.

    https://data.seattle.gov/.

  8. 8.

    https://snap.stanford.edu/data/.

  9. 9.

    Or vice versa: the bins of the t-pyramid are then hierarchically aggregated by x,y. This is commutative.

  10. 10.

    Also called Morton order [22]. This is a one-dimensional, linear ordering for any multi-dimensional data.

  11. 11.

    “Bins” into which no base data aggregates—“empty bins”—are never created. These numbers in the Z-order are simply skipped over.

  12. 12.

    This is sometimes referred to as a linear quadtree (for 2-D) [10].

  13. 13.

    The dataset is over three dimensions—\(X\), \(Y\), and \(T\)—so assume \(B = 2^{3d}\) for some \(d\), without loss of generality.

References

  1. Andrienko, N., Andrienko, G.: Exploratory analysis of spatial and temporal data: a systematic approach. Springer Science and Business Media, Heidelberg (2006)

    MATH  Google Scholar 

  2. Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., et al.: Spark SQL: relational data processing in spark. In: Proceedings of SIGMOD, pp. 1383–1394. ACM (2015)

    Google Scholar 

  3. Battle, L., Stonebraker, M., Chang, R.: Dynamic reduction of query result sets for interactive visualizaton. In: Proceedings of the International Conference on Big Data, Santa Clara, CA, USA, pp. 1–8 (2013)

    Google Scholar 

  4. Bertin, J.: Semiology of Graphics. University of Wisconsin Press, Madison (1983)

    Google Scholar 

  5. Beyer, M.A., Laney, D.: The importance of “big data”: a definition. Gartner report (2015)

    Google Scholar 

  6. Cable, D.: The racial dot map, demographics Research Group. Weldon Cooper Center for Public Service, University of Virginia, July 2013. www.coopercenter.org/demographics/Racial-Dot-Map

  7. Dijcks, J.P.: Oracle: Big data for the enterprise. Oracle White Paper (2012)

    Google Scholar 

  8. Elmqvist, N., Fekete, J.D.: Hierarchical aggregation for information visualization: overview, techniques, and design guidelines. IEEE Trans. Vis. Comput. Graph. 16(3), 439–454 (2010)

    Article  Google Scholar 

  9. Erickson, J.: Private correspondence, conveyed along with permission to use by Tilmann Rabl, May 2015

    Google Scholar 

  10. Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)

    Article  MATH  Google Scholar 

  11. Godfrey, P., Gryz, J., Lasek, P., Razavi, N.: Skydive: an interactive data visualization engine. In: IEEE Symposium on Large Data Analytics and Visualization, Chicago, USA, October 25–26, pp. 129–130 (2015)

    Google Scholar 

  12. Godfrey, P., Gryz, J., Lasek, P.: Interactive visualization of large data sets. Technical report EECS-2015-03, York University, March 2015

    Google Scholar 

  13. Godfrey, P., Gryz, J., Lasek, P., Razavi, N.: Visualization through inductive aggregation. In: Proceedings of EDBT, March 2016

    Google Scholar 

  14. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Disc. 1(1), 29–53 (1997)

    Article  Google Scholar 

  15. Hausenblas, M., Nadeau, J.: Apache drill: interactive ad-hoc analysis at scale. Big Data 1(2), 100–104 (2013)

    Article  Google Scholar 

  16. Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: Faster visual analytics through pixel-perfect aggregation. Proc. VLDB Endowment 7(13), 1705–1708 (2014)

    Article  Google Scholar 

  17. Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: M4: a visualization-oriented time series data aggregation. Proc. VLDB Endowment 7(10), 797–808 (2014)

    Article  Google Scholar 

  18. Laney, D.: Meta Group Res Note 6. META (2001)

    Google Scholar 

  19. Liu, Z., Jiang, B., Heer, J.: imMens: real-time visual querying of big data. Comput. Graph. Forum 32(3), 421–430 (2013)

    Article  Google Scholar 

  20. Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S.: Mars: real-time spatio-temporal queries on microblogs. In: ICDE, pp. 1238–1241 (2014)

    Google Scholar 

  21. Magdy, A., Mokbel, M.F., Elnikety, S., Nath, S., He, Y.: Mercury: a memory-constrained spatio-temporal real-time search on microblogs. In: ICDE, pp. 172–183. IEEE (2014)

    Google Scholar 

  22. Morton, G.M.: A Computer Oriented Geodetic Data Base and A New Technique in File Sequencing. International Business Machines Company, New York (1966)

    Google Scholar 

  23. Sallam, R.L., Hostmann, B., Schlegel, K., Tapadinhas, J., Parenteau, J., Oestreich, T.W.: Magic quadrant for business intelligence and analytics platforms. Gartner report (2015)

    Google Scholar 

  24. Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv. (CSUR) 16(2), 187–260 (1984)

    Article  MathSciNet  Google Scholar 

  25. Samet, H.: Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley Longman Publishing Co., Inc., Boston (1990)

    Google Scholar 

  26. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  27. Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., Tufano, P.: Analytics: The Real-World Use of Big Data. IBM Global Business Services, Somers (2012)

    Google Scholar 

  28. Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE Symposium on Visual Languages, pp. 336–343. IEEE (1996)

    Google Scholar 

  29. Shneiderman, B.: Extreme visualization: squeezing a billion records into a million pixels. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 3–12. ACM (2008)

    Google Scholar 

  30. Stolte, C., Tang, D., Hanrahan, P.: Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Trans. Vis. Comput. Graph. 8(1), 52–65 (2002)

    Article  Google Scholar 

  31. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  32. Tigani, J., Naidu, S.: Google BigQuery Analytics. John Wiley & Sons, Hoboken (2014)

    Google Scholar 

  33. Tufte, E.: Envisioning Information. Graphics Press, Cheshire (1990)

    Google Scholar 

  34. Wesley, R., Eldridge, M., Terlecki, P.T.: An analytic data engine for visualization in tableau. In: Proceedings of SIGMOD, pp. 1185–1194. ACM (2011)

    Google Scholar 

  35. Wesley, R.M.G., Terlecki, P.: Leveraging compression in the tableau data engine. In: Proceedings of SIGMOD, pp. 563–573. ACM (2014)

    Google Scholar 

  36. White, T.: Hadoop: The definitive guide. O’Reilly Media Inc, Sebastopol (2012)

    Google Scholar 

  37. Wu, E., Battle, L., Madden, S.R.: The case for data visualization management systems: vision paper. Proc. VLDB Endowment 7(10), 903–906 (2014)

    Article  Google Scholar 

  38. Zikopoulos, P.C., Eaton, C., DeRoos, D., Deutsch, T., Lapis, G.: Understanding Big Data. McGraw-Hill, New York (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jarek Gryz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Godfrey, P., Gryz, J., Lasek, P., Razavi, N. (2016). Interactive Visualization of Big Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics