Abstract
Enabling interactive visualization over new datasets at “human speed” is key to democratizing data science and maximizing human productivity. In this work, we first argue why existing analytics infrastructures do not support interactive data exploration and outline the challenges and opportunities of building a system specifically designed for interactive data exploration. Furthermore, we present the results of building IDEA, a new type of system for interactive data exploration that is specifically designed to integrate seamlessly with existing data management landscapes and allow users to explore their data instantly without expensive data preparation costs. Finally, we discuss other important considerations for interactive data exploration systems including benchmarking, natural language interfaces, as well as interactive machine learning.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, S., et al.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)
Apache Flink. http://flink.apache.org/
Binnig, C., et al.: Towards interactive curation & automatic tuning of ML pipelines. In: 1st Inaugural Conference on Systems ML (SysML) (2018)
Binnig, C., et al.: The end of slow networks: it’s time for a redesign. In: VLDB, pp. 528–539 (2016)
Böhm, C., Berchtold, S., Kriegel, H., Michel, U.: Multidimensional index structures in relational databases. J. Intell. Inf. Syst. 15, 51–70 (2000)
Chaudhuri, S., Das, G., Narasayya, V.R.: Optimized stratified sampling for approximate query processing. TODS 32, 9 (2007)
Crotty, A., et al.: Vizdom Demo Video. https://vimeo.com/139165014
Crotty, A., et al.: Vizdom: interactive analytics through pen and touch. In: VLDB, pp. 2024–2035 (2015)
Crotty, A., Galakatos, A., Zgraggen, E., Binnig, C., Kraska, T.: Vizdom: interactive analytics through pen and touch. Proc. VLDB Endow. 8(12), 2024–2027 (2015)
Crotty, A., Galakatos, A., Zgraggen, E., Binnig, C., Kraska, T.: The case for interactive data exploration accelerators (IDEAs). In: HILDA@SIGMOD, p. 11. ACM (2016)
Cumming, G., Finch, S.: Inference by eye: confidence intervals and how to read pictures of data. Am. Psychol. 60, 170–180 (2005)
Eichmann, P., Zgraggen, E., Zhao, Z., Binnig, C., Kraska, T.: Towards a benchmark for interactive data exploration. IEEE Data Eng. Bull. 39(4), 50–61 (2016)
El-Hindi, M., Zhao, Z., Binnig, C., Kraska, T.: VisTrees: fast indexes for interactive data exploration. In: HILDA (2016)
Fisher, D., DeLine, R., Czerwinski, M., Drucker, S.: Interactions with big data analytics. Interactions 19(3), 50–59 (2012)
Galakatos, A., Crotty, A., Zgraggen, E., Binnig, C., Kraska, T.: Revisiting reuse for approximate query processing. PVLDB 10(10), 1142–1153 (2017)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD, pp. 171–182 (1997)
Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR, pp. 68–78 (2007)
Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: ACM SIGMOD, pp. 615–629. ACM (2016)
Lichman, M.: UCI Machine Learning Repository (2013)
Liu, Z., Heer, J.: The effects of interactive latency on exploratory visual analysis. TVCG 20, 2122–2131 (2014)
Liu, Z., Jiang, B., Heer, J.: imMens: real-time visual querying of big data. In: EuroVis, pp. 421–430 (2013)
Olken, F., Rotem, D.: Random sampling from relational databases. In: VLDB, pp. 160–169 (1986)
Pansare, N., Borkar, V.R., Jermaine, C., Condie, T.: Online aggregation for large MapReduce jobs. In: VLDB, pp. 1135–1145 (2011)
Snappy data. https://www.snappydata.io/. Accessed 02 Nov 2017
Tableau. http://www.tableau.com. Accessed 02 Nov 2017
The Apache Software Foundation. Hadoop. http://hadoop.apache.org
TPC-DS (2016). http://www.tpc.org/tpcds/. Accessed 02 Nov 2017
TPC-H (2016). http://www.tpc.org/tpch/. Accessed 02 Nov 2017
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP, pp. 423–438 (2013)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)
Zgraggen, E., Galakatos, A., Crotty, A., Fekete, J., Kraska, T.: How progressive visualizations affect exploratory analysis. IEEE Trans. Vis. Comput. Graph. 23(8), 1977–1987 (2017)
Zhao, Z., De Stefani, L., Zgraggen, E., Binnig, C., Upfal, E., Kraska, T.: Controlling false discoveries during interactive data exploration. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 527–540. ACM (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Binnig, C. et al. (2019). Towards Interactive Data Exploration. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-24124-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24123-0
Online ISBN: 978-3-030-24124-7
eBook Packages: Computer ScienceComputer Science (R0)