Abstract
This paper presents a hybrid prediction and search approach (HPS) for building visualization systems of big data. The basic idea is training a regression model to predict a coarse range on the dataset and then searching target records that satisfy the query conditions within the range. The prediction reduces the storage cost without preprocessing a data structure storing aggregate values of queriable attribute range combinations. Meanwhile, the search eliminates the prediction bias inevitable for machine learning models. Experiments on multiple open datasets demonstrate HPS’s comparable query speed to existing techniques and the potential of continuous performance improvement by investing more hardware resources. In addition, the feature of returning original records instead of aggregate values brings better use flexibility, enabling to construct visualization systems with display/query functions that are unavailable for existing techniques.
Graphical abstract
Similar content being viewed by others
References
Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European conference on computer systems, pp 29–42. ACM
Chan C-Y, Ioannidis YE (1998) Bitmap index design and evaluation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 355–366
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec 26(1):65–74
Chaudhuri S, Ding B, Kandula S (2017) Approximate query processing: no silver bullet. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp 511–519
Chen Z, Zeng W, Yang Z, Yu L, Fu C-W, Qu H (2019) Lassonet: deep lasso-selection of 3d point clouds. IEEE Trans Vis Comput Graph 26(1):195–204
Chen C, Wang C, Bai X, Zhang P, Li C (2019) Generativemap: visualization and exploration of dynamic density maps via generative learning model. IEEE Trans Vis Comput Graph 26(1):216–226
Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1082–1090. ACM
Crotty A, Galakatos A, Zgraggen E, Binnig C, Kraska T (2015) Vizdom: interactive analytics through pen and touch. Proc VLDB Endow 8(12):2024–2027
Fisher D, Popov I, Drucker S, et al (2012) Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 1673–1682. ACM
Ghosh S, Eldway A (2020) Aid*: a spatial index for visual exploration of geo-spatial data. IEEE Trans Knowl Data Eng 34(8):3569–3582. https://doi.org/10.1109/TKDE.2020.3026657
Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. ACM SIGMOD Rec 28(2):287–298
He W, Wang J, Guo H, Wang K-C, Shen H-W, Raj M, Nashed YS, Peterka T (2019) Insitunet: deep image synthesis for parameter space exploration of ensemble simulations. IEEE Trans Vis Comput Graph 26(1):23–33
Hellerstein JM, Avnur R, Chou A, Hidber C, Olston C, Raman V, Roth T, Haas PJ (1999) Interactive data analysis: the control project. Computer 32(8):51–59
Jie L, Chun-qi Z (2022) Incorporation of human knowledge into data embeddings to improve pattern significance and interpretability. In: 2022 IEEE visualization conference (VIS). https://doi.org/10.1109/TVCG.2022.3209382
Kamat N, Jayachandran P, Tunga K, Nandi A (2014) Distributed and interactive cube exploration. In: 2014 IEEE 30th international conference on data engineering, pp 472–483. IEEE
Kraska T (2021) Northstar: An interactive data science system [J]. VLDB Endowment
Kwon BC, Verma J, Haas PJ, Demiralp C (2017) Sampling for scalable visual analytics. IEEE Comput Graph Appl 37(1):100–108
Li JK, Ma K-L (2019) P5: portable progressive parallel processing pipelines for interactive data analysis and visualization. IEEE Trans Vis Comput Graph 26(1):1151–1160
Li M, Choudhury FM, Bao Z, Samet H, Sellis T (2018a) Concavecubes: supporting cluster-based geographical visualization in large data scale. Comput Graph Forum 37(3):217–228
Li J, Chen S, Zhang K, Andrienko G, Andrienko N (2018b) COPE: interactive exploration of co-occurrence patterns in spatial timeseries [J]. IEEE Trans Vis Comput Graph 25(8):2554–2567
Lins L, Klosowski JT, Scheidegger C (2013) Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans Vis Comput Graph 19(12):2456
Liu Z, Heer J (2014) The effects of interactive latency on exploratory visual analysis. IEEE Trans Vis Comput Graph 20(12):2122–2131
Liu Z, Jiang B, Heer J (2013) imMens: real-time visual querying of big data. Eurographics 32:421–430
Liu C, Wu C, Shao H, Yuan X (2019) Smartcube: an adaptive data management architecture for the real-time visualization of spatiotemporal datasets. IEEE Trans Vis Comput Graph 26(1):790–799. https://doi.org/10.1109/TVCG.2019.2934434
Mei H, Chen W, Wei Y, Hu Y, Zhou S, Lin B, Zhao Y, Xia J (2019) Rsatree: distribution-aware data representation of large-scale tabular datasets for flexible visual query. IEEE Trans Vis Comput Graph 26(1):1161–1171. https://doi.org/10.1109/TVCG.2019.2934800
Miranda F, Lins L, Klosowski JT, Silva CT (2017) Topkube: a rank-aware data cube for real-time exploration of spatiotemporal data. IEEE Trans Vis Comput Graph 24(3):1394–1407
Miranda F, Lage M, Doraiswamy H, Mydlarz C, Salamon J, Lockerman Y, Freire J, Silva CT (2018) Time lattice: a data structure for the interactive visual analysis of large time series. Comput Graph Forum 37(3):23–35
Moritz D, Fisher D, Ding B, Wang C (2017) Trust, but verify: optimistic visualizations of approximate queries for exploring big data. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp 2904–2915
Moritz D, Howe B, Heer J (2019) Falcon: balancing interactive latency and resolution sensitivity for scalable linked visualizations. In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–11
Pahins CA, Stephens SA, Scheidegger C, Comba JL (2016) Hashedcubes: simple, low memory, real-time visual exploration of big data. IEEE Trans Vis Comput Graph 23(1):671–680
Pahins CA, Ferreira N, Comba JL (2019) Real-time exploration of large spatiotemporal datasets based on order statistics. IEEE Trans Vis Comput Graph 26(11):3314–3326
Rahman S, Aliakbarpour M, Kong HK, Blais E, Karahalios K, Parameswaran A, Rubinfield R (2017) I’ve seen enough: incrementally improving visualizations to support rapid decision making. Proc VLDB Endow 10(11):1262–1273
Rossi RA, Ahmed NK, Zhou R, Eldardiry H (2018) Interactive visual graph mining and learning. ACM Trans Intell Syst Technol (TIST) 9(5):1–25
Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538
Turkay C, Pezzotti N, Binnig C, Strobelt H, Hammer B, Keim DA, Fekete J-D, Palpanas T, Wang Y, Rusu F (2018) Progressive data science: potential and challenges. arXiv preprint arXiv:1812.08032
Vartak M, Rahman S, Madden S, Parameswaran A, Polyzotis N (2015) SEEDB: efficient data-driven visualization recommendations to support visual analytics. Proc VLDB Endow 8(13):2182–2193
Wang Z, Ferreira N, Wei Y, Bhaskar AS, Scheidegger CE (2017) Gaussian cubes: real-time modeling for visual exploration of large multidimensional datasets. IEEE Trans Vis Comput Graph 23(1):681–690
Wang Z, Cashman D, Li M, Li J, Berger M, Levine JA, Chang R, Scheidegger C (2021) Neuralcubes: deep representations for visual data exploration. In: 2021 IEEE international conference on big data (big data), pp 550–561. IEEE
Xia J, Lin W, Jiang G, Wang Y, Chen W, Schreck T (2021) Visual clustering factors in scatterplots. IEEE Comput Graph Appl 41(5):79–89. https://doi.org/10.1109/MCG.2021.3098804
Xia J, Zhang Y, Song J, Chen Y, Wang Y, Liu S (2022) Revisiting dimensionality reduction techniques for visual cluster analysis: an empirical study. IEEE Trans Vis Comput Graph 28(1):529–539. https://doi.org/10.1109/TVCG.2021.3114694
Xie C, Zhong W, Xu W, Mueller K (2018) Visual analytics of heterogeneous data using hypergraph learning. ACM Trans Intell Syst Technol (TIST) 10(1):1–26
Xu T, Zhang X, Claramunt C, Li X (2018) Tripcube: a trip-oriented vehicle trajectory data indexing structure. Comput Environ Urban Syst 67:21–28
Ying Z, Luhao G, Huixuan X, Genghuai B, Zhao Z, Qiang W, Yun L, Yuchao L, Fangfang Z (2022) ASTF: visual abstractions of time-varying patterns in radio signals. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2022.3209469
Yuan J, Chen C, Yang W, Liu M, Xia J, Liu S (2021) A survey of visual analytics techniques for machine learning. Comput Vis Media 7(1):3–36. https://doi.org/10.1007/s41095-020-0191-7
Zgraggen E, Galakatos A, Crotty A, Fekete J-D, Kraska T (2016) How progressive visualizations affect exploratory analysis. IEEE Trans Vis Comput Graph 23(8):1977–1987
Zhao Y, Shi J, Liu J, Zhao J, Zhou F, Zhang W, Chen K, Zhao X, Zhu C, Chen W (2021a) Evaluating effects of background stories on graph perception. IEEE Trans Vis Comput Graph https://doi.org/10.1109/TVCG.2021.3107297
Zhao Y, Zhang J, Fu C-W, Xu M, Moritz D, Wang Y (2021b) Kd-box: line-segment-based kd-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900
Acknowledgements
This work is supported by the NSFC Project (61972278) and Natural Science Foundation of Tianjin (20JCQNJC01620).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Sun, Y., Lei, Z. et al. A hybrid prediction and search approach for flexible and efficient exploration of big data. J Vis 26, 457–475 (2023). https://doi.org/10.1007/s12650-022-00887-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650-022-00887-y