Skip to main content
Log in

Automatic single table storage structure selection for hybrid workload

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In the use of database systems, the design of the storage engine and data model directly affects the performance of the database when performing queries. Therefore, the users of the database need to select the storage engine and design data model according to the workload encountered. However, in a hybrid workload, the query set of the database is dynamically changing, and the design of its optimised storage structure is also changing. Motivated by this, we propose an automatic storage structure selection system based on learning cost, which is used to dynamically select the optimised storage structure of the database under hybrid workloads. In the system, we introduce a machine learning method to build a cost model for the storage engine, and a column-oriented data layout generation algorithm. Experimental results show that the proposed system can choose the optimal combination of storage engine and data model according to the current workload, which greatly improves the performance of the default storage structure. And the system is designed to be compatible with different storage engines for easy use in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Bell_number.

  2. http://www.tpc.org/tpch/.

References

  1. Daniel JA, Samuel RM, Nabil H (2008) Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD

  2. Ioannis A, Stratos I, Anastasia A (2014) H2o: a hands-free adaptive store. In: Proceedings of the 2014 ACM SIGMOD

  3. Raja A, Manos K, Danica P, Anastasia A (2017) The case for heterogeneous htap. In: 8th Biennial conference on innovative data systems research, number CONF

  4. Joy A, Andrew P, Prashanth M (2016) Bridging the archipelago between row-stores and column-stores for hybrid workloads. In: Proceedings of the 2016 ACM SIGMOD

  5. Surajit C, Vivek N (1998) Autoadmin what-if index analysis utility. ACM SIGMOD Rec 27(2):367–378

    Article  Google Scholar 

  6. Surajit C, Vivek N (2007) Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd international conference on Very large data bases

  7. Tianqi C, Carlos G (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD

  8. Niv D, Stratos I (2018) Dostoevsky: better space-time trade-offs for lsm-tree based key-value stores via adaptive removal of superfluous merging. In: Proceedings of the 2018 ACM SIGMOD

  9. Andres F Pluggable table storage in postgresql. http://web.archive.org/web/20080207010024/http://www.808multimedia.com/winnt/kernel.htm. Accessed 14 June 2020

  10. Archana G, Harumi K, Umeshwar D, Janet LW, Armando F, Michael J, David P (2009) Predicting multiple metrics for queries: better decisions enabled by machine learning. In: 2009 IEEE 25th ICDE. IEEE

  11. Martin G, Jens K, Hasso P, Alexander Z, Philippe C-M, Samuel M (2010) Hyrise: a main memory hybrid storage engine. Proc VLDB Endow 4(2):105–116

    Article  Google Scholar 

  12. Tim K, Mohammad A, Alex B, Ed HC, Jialin D, Ani K, Guillaume L, Samuel M, Hongzi M, Vikram N (2019) Sagedb: a learned database system

  13. Fatma Ö, Yuanyuan T, Pinar T (2017) Hybrid transactional/analytical processing: a survey. In: Proceedings of the 2017 SIGMOD

  14. Alexander R, Stan Z (2013) An automatic physical design tool for clustered column-stores. In: Proceedings of the 16th international conference on extending database technology

  15. Michael S, U\(\hat{{\rm g}}\)ur Ç (2018) One size fits all" an idea whose time has come and gone. In: Making databases work: the pragmatic wisdom of Michael Stonebraker

  16. Gawade M, Kersten M, Simitsis A (2016) Multi-core column-store parallelization under concurrent workload. In: Proceedings of the 12th international workshop on data management on new hardware. 1–10

  17. TFRecord (2020) https://www.tensorflow.org/tutorials/load_data/tfrecord

  18. Protobuf (2020) https://developers.google.com/protocol-buffers

  19. ONNX (2020) https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange

  20. George L (2011) HBase: the definitive guide: random access to your planet-size data. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  21. AWS S3 (2020) https://aws.amazon.com/s3/

  22. Bhattacherjee S, Chavan A, Huang S, Deshpande A, Parameswaran A (2015) Principles of dataset versioning: exploring the recreation/storage tradeoff. In: Proceedings of the VLDB endowment. International conference on very large data bases 2015 Aug (Vol. 8, No. 12, p. 1346). NIH Public Access

  23. Bhardwaj A, Bhattacherjee S, Chavan A, Deshpande A, Elmore AJ, Madden S, Parameswaran AG (2014) Datahub: collaborative data science & dataset version management at scale. arXiv:1409.0798

  24. Miao H, Li A, Davis LS, Deshpande A (2016) Modelhub: towards unified data and lifecycle management for deep learning. arXiv:1611.06224

Download references

Acknowledgements

This paper was supported by NSFC Grant (62232005, 62202126, U1866602).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Wei, Y. & Yan, H. Automatic single table storage structure selection for hybrid workload. Knowl Inf Syst 65, 4713–4739 (2023). https://doi.org/10.1007/s10115-023-01913-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01913-7

Keywords

Navigation