Abstract
Crossfilter, a typical application for interactive data exploration (IDE), is widely used in data analysis, BI, and other fields. However, with the scale-up of the dataset, the real-time response of crossfilter can be hardly fulfilled. In this paper, we propose a memory-friendly and session-aware index called CrossIndex, which can support crossfilter-style queries with low latency. We first analyze a large number of query workloads generated by previous work and find that queries in the data exploration workload are inter-dependent, which means these queries have overlapped predicates. Based on this observation, this paper defines the inter-dependent queries as a session and builds a hierarchical index that can be used to accelerate crossfilter-style query processing by utilizing the overlapped property of the session to reduce unnecessary search space. Extensive experiments show that CrossIndex outperforms almost all other approaches and meanwhile keeps a low building cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)
Battle, L., Chang, R., Stonebraker, M.: Dynamic prefetching of data tiles for interactive visualization. In: SIGMOD, pp. 1363–1375 (2016)
Battle, L., et al.: Database benchmarking for supporting real-time interactive querying of large data. In: SIGMOD, pp. 1571–1587 (2020)
Battle, L., Heer, J.: Characterizing exploratory visual analysis: a literature review and evaluation of analytic provenance in tableau. In: CGF, vol. 38, pp. 145–159 (2019)
Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: hyper-pipelining query execution. In: CIDR, vol. 5, pp. 225–237 (2005)
Chaudhuri, S., Ding, B., Kandula, S.: Approximate query processing: no silver bullet. In: SIGMOD, pp. 511–519 (2017)
Ding, B., Huang, S., Chaudhuri, S., Chakrabarti, K., Wang, C.: Sample+ seek: approximating aggregates with distribution precision guarantee. In: SIGMOD, pp. 679–694 (2016)
Doshi, P.R., Rundensteiner, E.A., Ward, M.O.: Prefetching for visual data exploration. In: DASFAA, pp. 195–202 (2003)
Eichmann, P., Zgraggen, E., Binnig, C., Kraska, T.: IDEBench: a benchmark for interactive data exploration. In: SIGMOD, pp. 1555–1569 (2020)
Fekete, J., Fisher, D., Nandi, A., Sedlmair, M.: Progressive data analysis and visualization (Dagstuhl seminar 18411). Dagstuhl Rep. 8(10), 1–40 (2018)
Fisher, D., Popov, I., Drucker, S., Schraefel, M.: Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster. In: SIGCHI, pp. 1673–1682 (2012)
Gray, J., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. DMKD 1(1), 29–53 (1997)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD, pp. 171–182 (1997)
Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014)
Kandel, S., Parikh, R., Paepcke, A., Hellerstein, J.M., Heer, J.: Profiler: integrated statistical analysis and visualization for data quality assessment. In: AVI, pp. 547–554 (2012)
Li, L., et al.: BinDex: a two-layered index for fast and robust scans. In: SIGMOD, pp. 909–923 (2020)
Lins, L., Klosowski, J.T., Scheidegger, C.: NanoCubes for real-time exploration of spatiotemporal datasets. TVCG 19(12), 2456–2465 (2013)
Liu, Z., Heer, J.: The effects of interactive latency on exploratory visual analysis. TVCG 20(12), 2122–2131 (2014)
Liu, Z., Jiang, B., Heer, J.: imMens: real-time visual querying of big data. In: CGF, vol. 32, pp. 421–430 (2013)
Moritz, D., Howe, B., Heer, J.: Falcon: balancing interactive latency and resolution sensitivity for scalable linked visualizations. In: SIGCHI, pp. 1–11 (2019)
Psallidas, F., Wu, E.: Provenance for interactive visualizations. In: HILDA, pp. 1–8 (2018)
Psallidas, F., Wu, E.: Smoke: fine-grained lineage at interactive speed. Proc. VLDB Endow. (2018)
Satyanarayan, A., Russell, R., Hoffswell, J., Heer, J.: Reactive vega: a streaming dataflow architecture for declarative interactive visualization. TVCG 22(1), 659–668 (2015)
Vartak, M., Rahman, S., Madden, S., Parameswaran, A., Polyzotis, N.: SeeDB: efficient data-driven visualization recommendations to support visual analytics. Proc. VLDB Endow. 8(13), 2182–2193 (2015)
Wu, Z., Jing, Y., He, Z., Guo, C., Wang, X.S.: POLYTOPE: a flexible sampling system for answering exploratory queries. World Wide Web 23(1), 1–22 (2019). https://doi.org/10.1007/s11280-019-00685-x
Yang, Z., et al.: iExplore: accelerating exploratory data analysis by predicting user intention. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds.) DASFAA 2018. LNCS, vol. 10828, pp. 149–165. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91458-9_9
Zhang, Y., Zhang, H., He, Z., Jing, Y., Zhang, K., Wang, X.S.: Parrot: a progressive analysis system on large text collections. Data Sci. Eng. 6(1), 1–19 (2021)
Acknowledgement
This work is supported by the NSFC (No. 61732004, No. U1836207 and No. 62072113), the National Key R&D Program of China (No. 2018YFB1004404) and the Zhejiang Lab (No. 2021PE0AC01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xia, T., Zhang, H., Jing, Y., He, Z., Zhang, K., Wang, X.S. (2022). CrossIndex: Memory-Friendly and Session-Aware Index for Supporting Crossfilter in Interactive Data Exploration. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13245. Springer, Cham. https://doi.org/10.1007/978-3-031-00123-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-00123-9_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00122-2
Online ISBN: 978-3-031-00123-9
eBook Packages: Computer ScienceComputer Science (R0)