Skip to main content

Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPU

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13245))

Included in the following conference series:

Abstract

Sampling-based Approximate Query Processing (AQP) is one of the promising approaches for timely and cost-effective analytics over big data. There are mainly two methods to estimate errors of approximate query results, namely analytical method and bootstrap method. Although the bootstrap method is much more general than the first method, it is rarely used in the existing AQP system due to its high computation overhead. In this paper, we propose to use the powerful GPU and a series of advanced optimization mechanisms to accelerate bootstrap, thus make it feasible to address the essential err r estimation problem for AQP by utilizing bootstrap. Besides, since modern GPUs have bigger and bigger memory capacity, we can store samples in the GPU memory and use GPU to accelerate the execution of AQP queries in addition to using GPU to accelerate the bootstrap-based error estimation. Extensive experiments on the SSB benchmark show that our GPU-accelerated method is at most about two orders of magnitude faster than the CPU method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: SIGMOD (2015)

    Google Scholar 

  2. Efron, B.: Bootstrap methods: another look at the jackknife. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics), pp. 569–593. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_41

    Chapter  Google Scholar 

  3. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)

    Book  Google Scholar 

  4. Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUteraSort: high performance graphics co-processor sorting for large database management. In: SIGMOD (2006)

    Google Scholar 

  5. Guo, G.: Parallel statistical computing for statistical inference. J. Statist. Theory Pract. 6(3), 536–565 (2012)

    Article  MathSciNet  Google Scholar 

  6. He, B., et al.: Relational joins on graphics processors. In: SIGMOD (2008)

    Google Scholar 

  7. Iida, M., Miyata, Y., Shiohama, T.: Bootstrap estimation and model selection for multivariate normal mixtures using parallel computing with graphics processing units. Commun. Statist. Simul. Comput. 47(5), 1326–1342 (2018)

    Article  MathSciNet  Google Scholar 

  8. Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: Proceedings of the Eighth International Workshop on Data Management on New Hardware. pp, 55–62 (2012)

    Google Scholar 

  9. Lee, M.S., Lee, Y., Cheon, J.H., Paek, Y.: Accelerating bootstrapping in FHEW using GPUs. In: ASAP (2015)

    Google Scholar 

  10. Li, J., Tseng, H.W., Lin, C., Papakonstantinou, Y., Swanson, S.: HippogriffDB: Balancing I/O and GPU bandwidth in big data analytics. Proc. VLDB Endow. 9(14), 1647–1658 (2016)

    Article  Google Scholar 

  11. Mozafari, B.: Approximate query engines: commercial challenges and research opportunities. In: SIGMOD (2017)

    Google Scholar 

  12. O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). PAT 200, 50 (2007)

    Google Scholar 

  13. Pol, A., Jermaine, C.: Relational confidence bounds are easy with the bootstrap. In: SIGMOD (2005)

    Google Scholar 

  14. Root, C., Mostak, T.: MapD: a GPU-powered big data analytics and visualization platform. In: ACM SIGGRAPH 2016 Talks, pp. 1–2 (2016)

    Google Scholar 

  15. Shanbhag, A., Madden, S., Yu, X.: A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In: SIGMOD (2020)

    Google Scholar 

  16. Sitaridi, E.A., Ross, K.A.: Optimizing select conditions on GPUs. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware, pp. 1–8 (2013)

    Google Scholar 

  17. Wu, Z., Jing, Y., He, Z., Guo, C., Wang, X.S.: POLYTOPE: a flexible sampling system for answering exploratory queries. World Wide Web 23(1), 1–22 (2019). https://doi.org/10.1007/s11280-019-00685-x

    Article  Google Scholar 

  18. Yan, Y., Chen, L.J., Zhang, Z.: Error-bounded sampling for analytics on big sparse data. Proc. VLDB Endow. 7(13), 1508–1519 (2014)

    Article  Google Scholar 

  19. Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow. 6(10), 817–828 (2013)

    Article  Google Scholar 

  20. Zeng, K., Gao, S., Mozafari, B., Zaniolo, C.: The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: SIGMOD (2014)

    Google Scholar 

  21. Zhang, H., et al.: An agile sample maintenance approach for agile analytics. In: ICDE (2020)

    Google Scholar 

  22. Zhang, Y., Zhang, H., He, Z., Jing, Y., Zhang, K., Wang, X.S.: Parrot: a progressive analysis system on large text collections. Data Sci. Eng. 6(1), 1–19 (2021)

    Article  Google Scholar 

Download references

Acknowledgement

This work is funded by the NSFC (No. 61732004 and No. 62072113), the National Key R&D Program of China (No. 2018YFB1004404) and the Zhejiang Lab (No. 2021PE0AC01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinan Jing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, H., Zhang, H., Jing, Y., Zhang, K., He, Z., Wang, X.S. (2022). Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPU. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13245. Springer, Cham. https://doi.org/10.1007/978-3-031-00123-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-00123-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-00122-2

  • Online ISBN: 978-3-031-00123-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics