Automatic single table storage structure selection for hybrid workload

Wang, Hongzhi; Wei, Yan; Yan, Hao

doi:10.1007/s10115-023-01913-7

Automatic single table storage structure selection for hybrid workload

Regular Paper
Published: 07 June 2023

Volume 65, pages 4713–4739, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

186 Accesses
Explore all metrics

Abstract

In the use of database systems, the design of the storage engine and data model directly affects the performance of the database when performing queries. Therefore, the users of the database need to select the storage engine and design data model according to the workload encountered. However, in a hybrid workload, the query set of the database is dynamically changing, and the design of its optimised storage structure is also changing. Motivated by this, we propose an automatic storage structure selection system based on learning cost, which is used to dynamically select the optimised storage structure of the database under hybrid workloads. In the system, we introduce a machine learning method to build a cost model for the storage engine, and a column-oriented data layout generation algorithm. Experimental results show that the proposed system can choose the optimal combination of storage engine and data model according to the current workload, which greatly improves the performance of the default storage structure. And the system is designed to be compatible with different storage engines for easy use in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-Based Lightweight Storage Automatic Decision for In-Database Machine Learning

An Adaptive Approach for Index Tuning with Learning Classifier Systems on Hybrid Storage Environments

Sg: Automated tuning algorithm for storage systems based on simulated environments and group climbing

Article 27 December 2023

Notes

References

Daniel JA, Samuel RM, Nabil H (2008) Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD
Ioannis A, Stratos I, Anastasia A (2014) H2o: a hands-free adaptive store. In: Proceedings of the 2014 ACM SIGMOD
Raja A, Manos K, Danica P, Anastasia A (2017) The case for heterogeneous htap. In: 8th Biennial conference on innovative data systems research, number CONF
Joy A, Andrew P, Prashanth M (2016) Bridging the archipelago between row-stores and column-stores for hybrid workloads. In: Proceedings of the 2016 ACM SIGMOD
Surajit C, Vivek N (1998) Autoadmin what-if index analysis utility. ACM SIGMOD Rec 27(2):367–378
Article Google Scholar
Surajit C, Vivek N (2007) Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd international conference on Very large data bases
Tianqi C, Carlos G (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD
Niv D, Stratos I (2018) Dostoevsky: better space-time trade-offs for lsm-tree based key-value stores via adaptive removal of superfluous merging. In: Proceedings of the 2018 ACM SIGMOD
Andres F Pluggable table storage in postgresql. http://web.archive.org/web/20080207010024/http://www.808multimedia.com/winnt/kernel.htm. Accessed 14 June 2020
Archana G, Harumi K, Umeshwar D, Janet LW, Armando F, Michael J, David P (2009) Predicting multiple metrics for queries: better decisions enabled by machine learning. In: 2009 IEEE 25th ICDE. IEEE
Martin G, Jens K, Hasso P, Alexander Z, Philippe C-M, Samuel M (2010) Hyrise: a main memory hybrid storage engine. Proc VLDB Endow 4(2):105–116
Article Google Scholar
Tim K, Mohammad A, Alex B, Ed HC, Jialin D, Ani K, Guillaume L, Samuel M, Hongzi M, Vikram N (2019) Sagedb: a learned database system
Fatma Ö, Yuanyuan T, Pinar T (2017) Hybrid transactional/analytical processing: a survey. In: Proceedings of the 2017 SIGMOD
Alexander R, Stan Z (2013) An automatic physical design tool for clustered column-stores. In: Proceedings of the 16th international conference on extending database technology
Michael S, U\(\hat{{\rm g}}\)ur Ç (2018) One size fits all" an idea whose time has come and gone. In: Making databases work: the pragmatic wisdom of Michael Stonebraker
Gawade M, Kersten M, Simitsis A (2016) Multi-core column-store parallelization under concurrent workload. In: Proceedings of the 12th international workshop on data management on new hardware. 1–10
TFRecord (2020) https://www.tensorflow.org/tutorials/load_data/tfrecord
Protobuf (2020) https://developers.google.com/protocol-buffers
ONNX (2020) https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange
George L (2011) HBase: the definitive guide: random access to your planet-size data. O’Reilly Media Inc, Sebastopol
Google Scholar
AWS S3 (2020) https://aws.amazon.com/s3/
Bhattacherjee S, Chavan A, Huang S, Deshpande A, Parameswaran A (2015) Principles of dataset versioning: exploring the recreation/storage tradeoff. In: Proceedings of the VLDB endowment. International conference on very large data bases 2015 Aug (Vol. 8, No. 12, p. 1346). NIH Public Access
Bhardwaj A, Bhattacherjee S, Chavan A, Deshpande A, Elmore AJ, Madden S, Parameswaran AG (2014) Datahub: collaborative data science & dataset version management at scale. arXiv:1409.0798
Miao H, Li A, Davis LS, Deshpande A (2016) Modelhub: towards unified data and lifecycle management for deep learning. arXiv:1611.06224

Download references

Acknowledgements

This paper was supported by NSFC Grant (62232005, 62202126, U1866602).

Author information

Yan Wei and Hao Yan have contributed equally to this work.

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Hongzhi Wang, Yan Wei & Hao Yan

Authors

Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Wei, Y. & Yan, H. Automatic single table storage structure selection for hybrid workload. Knowl Inf Syst 65, 4713–4739 (2023). https://doi.org/10.1007/s10115-023-01913-7

Download citation

Received: 18 January 2022
Revised: 08 May 2023
Accepted: 22 May 2023
Published: 07 June 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10115-023-01913-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic single table storage structure selection for hybrid workload

Abstract

Access this article

Similar content being viewed by others

Cost-Based Lightweight Storage Automatic Decision for In-Database Machine Learning

An Adaptive Approach for Index Tuning with Learning Classifier Systems on Hybrid Storage Environments

Sg: Automated tuning algorithm for storage systems based on simulated environments and group climbing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic single table storage structure selection for hybrid workload

Abstract

Access this article

Similar content being viewed by others

Cost-Based Lightweight Storage Automatic Decision for In-Database Machine Learning

An Adaptive Approach for Index Tuning with Learning Classifier Systems on Hybrid Storage Environments

Sg: Automated tuning algorithm for storage systems based on simulated environments and group climbing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation