Abstract
Analyses with data mining and knowledge discovery techniques are not always successful as they occasionally yield no actionable results. This is especially true in the Big-Data context where we routinely deal with complex, heterogeneous, diverse and rapidly changing data. In this context, visual analytics play a key role in helping both experts and users to readily comprehend and better manage analyses carried on data stored in Infrastructure as a Service (IaaS) cloud services. To this end, humans should play a critical role in continually ascertaining the value of the processed information and are invariably deemed to be the instigators of actionable tasks. The latter is facilitated with the assistance of sophisticated tools that let humans interface with the data through vision and interaction. When working with Big-Data problems, both scale and nature of data undoubtedly present a barrier in implementing responsive applications. In this paper, we propose a software architecture that seeks to empower Big-Data analysts with visual analytics tools atop large-scale data stored in and processed by IaaS. Our key goal is to not only yield on-line analytic processing but also provide the facilities for the users to effectively interact with the underlying IaaS machinery. Although we focus on hierarchical and spatiotemporal datasets here, our proposed architecture is general and can be used to a wide number of application domains. The core design principles of our approach are: (a) On-line processing on cloud with Apache Spark. (b) Integration of interactive programming following the notebook paradigm through Apache Zeppelin. (c) Offering robust operation when data and/or schema change on the fly. Through experimentation with a prototype of our suggested architecture, we demonstrate not only the viability of our approach but also we show its value in a use-case involving publicly available crime data from United Kingdom.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Argus-Panoptes is a figure from Greek mythology, it was an “all-seeing” giant having a watchman role.
- 2.
Source code repository is available at: https://github.com/panayiotis/visual_analytics.
- 3.
Around 200 MB in total.
References
Apache Zeppelin: Zeppelin: web-based notebook (2009). https://zeppelin.apache.org. Accessed 30 June 2018
Cloudera: Hue is an open source analytics workbench for self service BI. (2009). http://gethue.com. Accessed 30 June 2018
Daniel, K., Kohlhammer, J., Ellis, G., Mansman, F. (eds.): Mastering the Information Age Solving Problems with Visual Analytics. Eurographics Association (2010)
Dibia, V., Demiralp, Ç.: Data2Vis: automatic generation of data visualizations using sequence to sequence recurrent neural networks, April 2018. arxiv.org/abs/1804.03126
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
EUROSTAT: NUTS - nomenclature of territorial units for statistics (2016). http://ec.europa.eu/eurostat/web/nuts/background. Accessed 30 June 2018
Facebook Inc.: React: a JavaScript library for building user interfaces (2009). https://reactjs.org. Accessed 30 June 2018
Fekete, J.D.: Visual analytics infrastructures: from data management to exploration. Computer 46(7), 22–29 (2013)
Home Office, UK: ASB incidents, crime and outcomes (2015). https://data.police.uk/about/. Accessed 30 June 2018
Jupyter Team: Jupyter project (2009). https://jupyter.org. Accessed 30 June 2018
Keim, D.A.: Visual exploration of large data sets. Commun. ACM 44(8), 38–44 (2001)
Liu, Z., Jiang, B., Heer, J.: ImMens: real-time visual querying of Big Data. Comput. Graph. Forum 32(3), 421–430 (2013)
Novus Partners: NVD3: reusable charts for d3.js (2014). http://nvd3.org. Accessed 30 June 2018
Sriharsha, R.: Magellan: geospatial analytics using spark (2015). https://github.com/harsha2010/magellan. Accessed 30 June 2018
Siddiqui, T., Kim, A., Lee, J., Karahalios, K., Parameswaran, A.: Effortless data exploration with zenvisage: an expressive and interactive visual analytics system. Proc. VLDB Endow. 10(4), 457–468 (2016)
Thomas, J.J., Cook, K.A.: Illuminating the path: the research and development agenda for visual analytics. IEEE Computer Society (2005). http://vis.pnnl.gov/pdf/RD_Agenda_VisualAnalytics.pdf
Uber: Deck.gl large-scale WebGL-powered data visualization. https://uber.github.io/deck.gl
Vartak, M., Huang, S., Siddiqui, T., Madden, S., Parameswaran, A.: Towards visualization recommendation systems. ACM SIGMOD Rec. 45(4), 34–39 (2017)
Wong, P.C., Shen, H.W., Johnson, C.R., Chen, C., Ross, R.B.: The top 10 challenges in extreme-scale visual analytics. IEEE Comput. Graphics Appl. 32(4), 63–67 (2012)
Wongsuphasawat, K., et al.: Voyager 2. In: Proceedings of 2017 CHI Conference on Human Factors in Computing Systems (CHI 2017), Denver, pp. 2648–2659, May 2017)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of 9th USENIX Conference on Networked Systems Design and Implementation (NSDI 2012), San Jose (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vlantis, P.I., Delis, A. (2019). On-Line Big-Data Processing for Visual Analytics with Argus-Panoptes. In: Disser, Y., Verykios, V. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2018. Lecture Notes in Computer Science(), vol 11409. Springer, Cham. https://doi.org/10.1007/978-3-030-19759-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-19759-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19758-2
Online ISBN: 978-3-030-19759-9
eBook Packages: Computer ScienceComputer Science (R0)