Skip to main content
Log in

A hybrid prediction and search approach for flexible and efficient exploration of big data

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

This paper presents a hybrid prediction and search approach (HPS) for building visualization systems of big data. The basic idea is training a regression model to predict a coarse range on the dataset and then searching target records that satisfy the query conditions within the range. The prediction reduces the storage cost without preprocessing a data structure storing aggregate values of queriable attribute range combinations. Meanwhile, the search eliminates the prediction bias inevitable for machine learning models. Experiments on multiple open datasets demonstrate HPS’s comparable query speed to existing techniques and the potential of continuous performance improvement by investing more hardware resources. In addition, the feature of returning original records instead of aggregate values brings better use flexibility, enabling to construct visualization systems with display/query functions that are unavailable for existing techniques.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European conference on computer systems, pp 29–42. ACM

  • Chan C-Y, Ioannidis YE (1998) Bitmap index design and evaluation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 355–366

  • Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec 26(1):65–74

    Article  Google Scholar 

  • Chaudhuri S, Ding B, Kandula S (2017) Approximate query processing: no silver bullet. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp 511–519

  • Chen Z, Zeng W, Yang Z, Yu L, Fu C-W, Qu H (2019) Lassonet: deep lasso-selection of 3d point clouds. IEEE Trans Vis Comput Graph 26(1):195–204

    Google Scholar 

  • Chen C, Wang C, Bai X, Zhang P, Li C (2019) Generativemap: visualization and exploration of dynamic density maps via generative learning model. IEEE Trans Vis Comput Graph 26(1):216–226

    Google Scholar 

  • Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1082–1090. ACM

  • Crotty A, Galakatos A, Zgraggen E, Binnig C, Kraska T (2015) Vizdom: interactive analytics through pen and touch. Proc VLDB Endow 8(12):2024–2027

    Article  Google Scholar 

  • Fisher D, Popov I, Drucker S, et al (2012) Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 1673–1682. ACM

  • Ghosh S, Eldway A (2020) Aid*: a spatial index for visual exploration of geo-spatial data. IEEE Trans Knowl Data Eng 34(8):3569–3582. https://doi.org/10.1109/TKDE.2020.3026657

  • Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. ACM SIGMOD Rec 28(2):287–298

    Article  Google Scholar 

  • He W, Wang J, Guo H, Wang K-C, Shen H-W, Raj M, Nashed YS, Peterka T (2019) Insitunet: deep image synthesis for parameter space exploration of ensemble simulations. IEEE Trans Vis Comput Graph 26(1):23–33

    Google Scholar 

  • Hellerstein JM, Avnur R, Chou A, Hidber C, Olston C, Raman V, Roth T, Haas PJ (1999) Interactive data analysis: the control project. Computer 32(8):51–59

    Article  Google Scholar 

  • Jie L, Chun-qi Z (2022) Incorporation of human knowledge into data embeddings to improve pattern significance and interpretability. In: 2022 IEEE visualization conference (VIS). https://doi.org/10.1109/TVCG.2022.3209382

  • Kamat N, Jayachandran P, Tunga K, Nandi A (2014) Distributed and interactive cube exploration. In: 2014 IEEE 30th international conference on data engineering, pp 472–483. IEEE

  • Kraska T (2021) Northstar: An interactive data science system [J]. VLDB Endowment

  • Kwon BC, Verma J, Haas PJ, Demiralp C (2017) Sampling for scalable visual analytics. IEEE Comput Graph Appl 37(1):100–108

    Article  Google Scholar 

  • Li JK, Ma K-L (2019) P5: portable progressive parallel processing pipelines for interactive data analysis and visualization. IEEE Trans Vis Comput Graph 26(1):1151–1160

    MathSciNet  Google Scholar 

  • Li M, Choudhury FM, Bao Z, Samet H, Sellis T (2018a) Concavecubes: supporting cluster-based geographical visualization in large data scale. Comput Graph Forum 37(3):217–228

    Article  Google Scholar 

  • Li J, Chen S, Zhang K, Andrienko G, Andrienko N (2018b) COPE: interactive exploration of co-occurrence patterns in spatial timeseries [J]. IEEE Trans Vis Comput Graph 25(8):2554–2567

  • Lins L, Klosowski JT, Scheidegger C (2013) Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans Vis Comput Graph 19(12):2456

    Article  Google Scholar 

  • Liu Z, Heer J (2014) The effects of interactive latency on exploratory visual analysis. IEEE Trans Vis Comput Graph 20(12):2122–2131

    Article  Google Scholar 

  • Liu Z, Jiang B, Heer J (2013) imMens: real-time visual querying of big data. Eurographics 32:421–430

    Google Scholar 

  • Liu C, Wu C, Shao H, Yuan X (2019) Smartcube: an adaptive data management architecture for the real-time visualization of spatiotemporal datasets. IEEE Trans Vis Comput Graph 26(1):790–799. https://doi.org/10.1109/TVCG.2019.2934434

  • Mei H, Chen W, Wei Y, Hu Y, Zhou S, Lin B, Zhao Y, Xia J (2019) Rsatree: distribution-aware data representation of large-scale tabular datasets for flexible visual query. IEEE Trans Vis Comput Graph 26(1):1161–1171. https://doi.org/10.1109/TVCG.2019.2934800

  • Miranda F, Lins L, Klosowski JT, Silva CT (2017) Topkube: a rank-aware data cube for real-time exploration of spatiotemporal data. IEEE Trans Vis Comput Graph 24(3):1394–1407

    Article  Google Scholar 

  • Miranda F, Lage M, Doraiswamy H, Mydlarz C, Salamon J, Lockerman Y, Freire J, Silva CT (2018) Time lattice: a data structure for the interactive visual analysis of large time series. Comput Graph Forum 37(3):23–35

    Article  Google Scholar 

  • Moritz D, Fisher D, Ding B, Wang C (2017) Trust, but verify: optimistic visualizations of approximate queries for exploring big data. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp 2904–2915

  • Moritz D, Howe B, Heer J (2019) Falcon: balancing interactive latency and resolution sensitivity for scalable linked visualizations. In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–11

  • Pahins CA, Stephens SA, Scheidegger C, Comba JL (2016) Hashedcubes: simple, low memory, real-time visual exploration of big data. IEEE Trans Vis Comput Graph 23(1):671–680

    Article  Google Scholar 

  • Pahins CA, Ferreira N, Comba JL (2019) Real-time exploration of large spatiotemporal datasets based on order statistics. IEEE Trans Vis Comput Graph 26(11):3314–3326

    Article  Google Scholar 

  • Rahman S, Aliakbarpour M, Kong HK, Blais E, Karahalios K, Parameswaran A, Rubinfield R (2017) I’ve seen enough: incrementally improving visualizations to support rapid decision making. Proc VLDB Endow 10(11):1262–1273

    Article  Google Scholar 

  • Rossi RA, Ahmed NK, Zhou R, Eldardiry H (2018) Interactive visual graph mining and learning. ACM Trans Intell Syst Technol (TIST) 9(5):1–25

    Article  Google Scholar 

  • Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538

  • Turkay C, Pezzotti N, Binnig C, Strobelt H, Hammer B, Keim DA, Fekete J-D, Palpanas T, Wang Y, Rusu F (2018) Progressive data science: potential and challenges. arXiv preprint arXiv:1812.08032

  • Vartak M, Rahman S, Madden S, Parameswaran A, Polyzotis N (2015) SEEDB: efficient data-driven visualization recommendations to support visual analytics. Proc VLDB Endow 8(13):2182–2193

    Article  Google Scholar 

  • Wang Z, Ferreira N, Wei Y, Bhaskar AS, Scheidegger CE (2017) Gaussian cubes: real-time modeling for visual exploration of large multidimensional datasets. IEEE Trans Vis Comput Graph 23(1):681–690

    Article  Google Scholar 

  • Wang Z, Cashman D, Li M, Li J, Berger M, Levine JA, Chang R, Scheidegger C (2021) Neuralcubes: deep representations for visual data exploration. In: 2021 IEEE international conference on big data (big data), pp 550–561. IEEE

  • Xia J, Lin W, Jiang G, Wang Y, Chen W, Schreck T (2021) Visual clustering factors in scatterplots. IEEE Comput Graph Appl 41(5):79–89. https://doi.org/10.1109/MCG.2021.3098804

    Article  Google Scholar 

  • Xia J, Zhang Y, Song J, Chen Y, Wang Y, Liu S (2022) Revisiting dimensionality reduction techniques for visual cluster analysis: an empirical study. IEEE Trans Vis Comput Graph 28(1):529–539. https://doi.org/10.1109/TVCG.2021.3114694

    Article  Google Scholar 

  • Xie C, Zhong W, Xu W, Mueller K (2018) Visual analytics of heterogeneous data using hypergraph learning. ACM Trans Intell Syst Technol (TIST) 10(1):1–26

    Google Scholar 

  • Xu T, Zhang X, Claramunt C, Li X (2018) Tripcube: a trip-oriented vehicle trajectory data indexing structure. Comput Environ Urban Syst 67:21–28

    Article  Google Scholar 

  • Ying Z, Luhao G, Huixuan X, Genghuai B, Zhao Z, Qiang W, Yun L, Yuchao L, Fangfang Z (2022) ASTF: visual abstractions of time-varying patterns in radio signals. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2022.3209469

    Article  Google Scholar 

  • Yuan J, Chen C, Yang W, Liu M, Xia J, Liu S (2021) A survey of visual analytics techniques for machine learning. Comput Vis Media 7(1):3–36. https://doi.org/10.1007/s41095-020-0191-7

    Article  Google Scholar 

  • Zgraggen E, Galakatos A, Crotty A, Fekete J-D, Kraska T (2016) How progressive visualizations affect exploratory analysis. IEEE Trans Vis Comput Graph 23(8):1977–1987

    Article  Google Scholar 

  • Zhao Y, Shi J, Liu J, Zhao J, Zhou F, Zhang W, Chen K, Zhao X, Zhu C, Chen W (2021a) Evaluating effects of background stories on graph perception. IEEE Trans Vis Comput Graph https://doi.org/10.1109/TVCG.2021.3107297

  • Zhao Y, Zhang J, Fu C-W, Xu M, Moritz D, Wang Y (2021b) Kd-box: line-segment-based kd-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900

Download references

Acknowledgements

This work is supported by the NSFC Project (61972278) and Natural Science Foundation of Tianjin (20JCQNJC01620).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Sun, Y., Lei, Z. et al. A hybrid prediction and search approach for flexible and efficient exploration of big data. J Vis 26, 457–475 (2023). https://doi.org/10.1007/s12650-022-00887-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-022-00887-y

Keywords

Navigation