Skip to main content

Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases

  • Conference paper
  • First Online:
  • 2076 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11606))

Abstract

Reliable query execution time prediction is a desirable feature for modern databases because it can greatly help ease the database administration work and is the foundation of various database management/automation tools. Most exiting studies on modeling query execution time assume that each individual query is executed as serialized steps. However, with the increasing data volume and the demand for low query latency, large-scale databases have been adopting the massive parallel processing (MPP) architecture. In this paper, we present a novel machine learning based approach for building a robust model to estimate query execution time by considering both query-based statistics and real-time system attributes. The experiment results demonstrate our approach is able to reliably predict query execution time in both idle and noisy environments at random levels of concurrency. In addition, we found that both query and system factors are crucial in making stable predictions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys 2013, pp. 29–42. ACM, New York (2013). https://doi.org/10.1145/2465351.2465355

  2. Chaudhuri, S., Weikum, G.: Rethinking database system architecture: towards a self-tuning RISC-style database system. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 1–10. Morgan Kaufmann Publishers Inc., San Francisco (2000). http://dl.acm.org/citation.cfm?id=645926.671696

  3. Council, T.P.P.: TPC-H benchmark specification, 21, 592–603 (2008). http://www.tcp.org/hspec.html

  4. Dageville, B., Das, D., Dias, K., Yagoub, K., Zait, M., Ziauddin, M.: Automatic SQL tuning in oracle 10G. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30, pp. 1098–1109. VLDB Endowment (2004). http://dl.acm.org/citation.cfm?id=1316689.1316784

  5. Duggan, J., Cetintemel, U., Papaemmanouil, O., Upfal, E.: Performance prediction for concurrent database workloads. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 337–348. ACM, New York (2011). https://doi.org/10.1145/1989323.1989359

  6. Golov, N., Rönnbäck, L.: Big data normalization for massively parallel processing databases. In: Jeusfeld, M.A., Karlapalem, K. (eds.) ER 2015. LNCS, vol. 9382, pp. 154–163. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25747-1_16

    Chapter  Google Scholar 

  7. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  8. Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)

    Google Scholar 

  9. Jörg, T., Deßloch, S.: Towards generating ETL processes for incremental loading. In: Proceedings of the 2008 International Symposium on Database Engineering & Applications, pp. 101–110. ACM (2008)

    Google Scholar 

  10. Krompass, S., Kuno, H., Wiener, J.L., Wilkinson, K., Dayal, U., Kemper, A.: Managing long-running queries. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. EDBT 2009, pp. 132–143. ACM, New York (2009). https://doi.org/10.1145/1516360.1516377

  11. Kuno, H., Dayal, U., Wiener, J.L., Wilkinson, K., Ganapathi, A., Krompass, S.: Managing dynamic mixed workloads for operational business intelligence. In: Kikuchi, S., Sachdeva, S., Bhalla, S. (eds.) DNIS 2010. LNCS, vol. 5999, pp. 11–26. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12038-1_2

    Chapter  Google Scholar 

  12. Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012). https://doi.org/10.14778/2367502.2367518

    Article  Google Scholar 

  13. Lehner, W., Sattler, K.: Database as a service (DBaaS). In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 1216–1217 (2010). https://doi.org/10.1109/ICDE.2010.5447723

  14. Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)

    Google Scholar 

  15. Macdonald, C., Tonellotto, N., Ounis, I.: Learning to predict response times for online query scheduling. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 621–630. ACM, New York (2012). https://doi.org/10.1145/2348283.2348367

  16. Pelkonen, T., et al.: Gorilla: a fast, scalable, in-memory time series database. Proc. VLDB Endow. 8(12), 1816–1827 (2015). https://doi.org/10.14778/2824032.2824078

    Article  Google Scholar 

  17. Rahm, E., Marek, R.: Dynamic multi-resource load balancing in parallel database systems. In: Proceedings of the 21st International Conference on Very Large Data Bases, VLDB 1995, pp. 395–406. Morgan Kaufmann Publishers Inc., San Francisco (1995). http://dl.acm.org/citation.cfm?id=645921.673163

  18. Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010). https://doi.org/10.1145/1629175.1629197

    Article  Google Scholar 

  19. Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB 2005, pp. 553–564. VLDB Endowment (2005). http://dl.acm.org/citation.cfm?id=1083592.1083658

  20. Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1081–1092, April 2013. https://doi.org/10.1109/ICDE.2013.6544899

  21. Wu, W., Chi, Y., Hacígümüş, H., Naughton, J.F.: Towards predicting query execution time for concurrent and dynamic database workloads. Proc. VLDB Endow. 6(10), 925–936 (2013). https://doi.org/10.14778/2536206.2536219

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengyu Hong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, Z., Bei, Y., Sun, H., Hong, P. (2019). Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2019. Lecture Notes in Computer Science(), vol 11606. Springer, Cham. https://doi.org/10.1007/978-3-030-22999-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22999-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22998-6

  • Online ISBN: 978-3-030-22999-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics