Abstract
Sampling-based Approximate Query Processing (AQP) is one of the promising approaches for timely and cost-effective analytics over big data. There are mainly two methods to estimate errors of approximate query results, namely analytical method and bootstrap method. Although the bootstrap method is much more general than the first method, it is rarely used in the existing AQP system due to its high computation overhead. In this paper, we propose to use the powerful GPU and a series of advanced optimization mechanisms to accelerate bootstrap, thus make it feasible to address the essential err r estimation problem for AQP by utilizing bootstrap. Besides, since modern GPUs have bigger and bigger memory capacity, we can store samples in the GPU memory and use GPU to accelerate the execution of AQP queries in addition to using GPU to accelerate the bootstrap-based error estimation. Extensive experiments on the SSB benchmark show that our GPU-accelerated method is at most about two orders of magnitude faster than the CPU method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: SIGMOD (2015)
Efron, B.: Bootstrap methods: another look at the jackknife. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics), pp. 569–593. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_41
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUteraSort: high performance graphics co-processor sorting for large database management. In: SIGMOD (2006)
Guo, G.: Parallel statistical computing for statistical inference. J. Statist. Theory Pract. 6(3), 536–565 (2012)
He, B., et al.: Relational joins on graphics processors. In: SIGMOD (2008)
Iida, M., Miyata, Y., Shiohama, T.: Bootstrap estimation and model selection for multivariate normal mixtures using parallel computing with graphics processing units. Commun. Statist. Simul. Comput. 47(5), 1326–1342 (2018)
Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: Proceedings of the Eighth International Workshop on Data Management on New Hardware. pp, 55–62 (2012)
Lee, M.S., Lee, Y., Cheon, J.H., Paek, Y.: Accelerating bootstrapping in FHEW using GPUs. In: ASAP (2015)
Li, J., Tseng, H.W., Lin, C., Papakonstantinou, Y., Swanson, S.: HippogriffDB: Balancing I/O and GPU bandwidth in big data analytics. Proc. VLDB Endow. 9(14), 1647–1658 (2016)
Mozafari, B.: Approximate query engines: commercial challenges and research opportunities. In: SIGMOD (2017)
O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). PAT 200, 50 (2007)
Pol, A., Jermaine, C.: Relational confidence bounds are easy with the bootstrap. In: SIGMOD (2005)
Root, C., Mostak, T.: MapD: a GPU-powered big data analytics and visualization platform. In: ACM SIGGRAPH 2016 Talks, pp. 1–2 (2016)
Shanbhag, A., Madden, S., Yu, X.: A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In: SIGMOD (2020)
Sitaridi, E.A., Ross, K.A.: Optimizing select conditions on GPUs. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware, pp. 1–8 (2013)
Wu, Z., Jing, Y., He, Z., Guo, C., Wang, X.S.: POLYTOPE: a flexible sampling system for answering exploratory queries. World Wide Web 23(1), 1–22 (2019). https://doi.org/10.1007/s11280-019-00685-x
Yan, Y., Chen, L.J., Zhang, Z.: Error-bounded sampling for analytics on big sparse data. Proc. VLDB Endow. 7(13), 1508–1519 (2014)
Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow. 6(10), 817–828 (2013)
Zeng, K., Gao, S., Mozafari, B., Zaniolo, C.: The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: SIGMOD (2014)
Zhang, H., et al.: An agile sample maintenance approach for agile analytics. In: ICDE (2020)
Zhang, Y., Zhang, H., He, Z., Jing, Y., Zhang, K., Wang, X.S.: Parrot: a progressive analysis system on large text collections. Data Sci. Eng. 6(1), 1–19 (2021)
Acknowledgement
This work is funded by the NSFC (No. 61732004 and No. 62072113), the National Key R&D Program of China (No. 2018YFB1004404) and the Zhejiang Lab (No. 2021PE0AC01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, H., Zhang, H., Jing, Y., Zhang, K., He, Z., Wang, X.S. (2022). Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPU. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13245. Springer, Cham. https://doi.org/10.1007/978-3-031-00123-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-00123-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00122-2
Online ISBN: 978-3-031-00123-9
eBook Packages: Computer ScienceComputer Science (R0)