Abstract
Recently, people begin to think that database can be augmented with machine learning. A recent study showed that deep learning could be used to model index structures. Such learning approach assumes that there is some particular data distribution in the database. However, we argue that the data distribution in the database may not follow a specific pattern in the real world and the learning models are usually too complicated, which makes the training process expensive. In this paper, we show that linear models can achieve the same precision as models trained by deep learning using a hybrid method and are easier to maintain. Based on this, we propose a hybrid method by exploring traditional b-tree and linear regression. The hybrid method retrieves data and checks whether the data can benefit from learning approach. We have implemented a prototype hybrid indexes in Postgres. By comparing with b-tree, we show that our method is more efficient on index construction, insertion, and query execution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Athanassoulis, M., Ailamaki, A.: BF-tree: approximate tree indexing. Proc. VLDB Endowment 7(14), 1881–1892 (2014)
Bayer, R.: Symmetric binary B-trees: data structure and maintenance algorithms. Acta informatica 1(4), 290–306 (1971)
Boehm, M., Schlegel, B., Volk, P.B., et al.: Efficient in-memory indexing with generalized prefix trees. Datenbanksysteme für Business, Technologie und Web (BTW) (2011)
Boyar, J., Larsen, K.S.: Efficient rebalancing of chromatic search trees. J. Comput. Syst. Sci. 49(3), 667–682 (1992)
Galakatos, A., Markovitch, M., Binnig, C., et al.: A-tree: a bounded approximate index structure. CoRR, abs/1801.10207 (2018)
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of the 14th International Conference on Data Engineering, pp. 370–379. IEEE (1998)
Graefe, G., Larson, P.A.: B-tree indexes and CPU caches. In: Proceedings of the 17th International Conference on Data Engineering, pp. 349–358. IEEE (2001)
Graefe, G.: B-tree indexes, interpolation search, and skew. In: Proceedings of the 2nd International Workshop on Data Management on New Hardware, p. 5. ACM (2006)
Kang, D., Jung, D., Kang, J.U., et al.: μ-tree: an ordered index structure for NAND flash memory. In: Proceedings of the 7th ACM & IEEE International Conference on Embedded Software, pp. 144–153. ACM (2007)
Kim, C., Chhugani, J., Satish, N., et al.: FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 339–350. ACM (2010)
Kissinger, T., Schlegel, B., Habich, D., et al.: KISS-tree: smart latch-free in-memory indexing on modern architectures. In: Proceedings of the Eighth International Workshop on Data Management on New Hardware, pp. 16–23. ACM (2012)
Kraska, T., Beutel, A., Chi, E.H., et al.: The case for learned index structures. In: Proceedings of the 2018 International Conference on Management of Data, pp. 489–504. ACM (2018)
Lehman, T.J., Carey, M.J.: A study of index structures for main memory database management systems. University of Wisconsin-Madison Department of Computer Sciences (1986)
Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: ARTful indexing for main-memory databases. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 38–49. IEEE (2013)
Li, Y., He, B., Yang, R.J., et al.: Tree indexing on solid state drives. Proc. VLDB Endowment 3(1–2), 1195–1206 (2010)
Lu, H., Ng, Y.Y., Tian, Z.: T-tree or b-tree: main memory database index structure revisited. In: Proceedings 11th Australasian Database Conference, ADC 2000 (Cat. No. PR00528), pp. 65–73. IEEE (2000)
Postgres database. http://www.postgresql.org/. Accessed 8 Apr 2019
Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. ACM Sigmod Rec. 29(2), 475–486 (2000)
Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: International Conference on Very Large Data Bases, pp. 78–89. Morgan Kaufmann Publishers Inc. (1999)
The datasets of IMDB. https://datasets.imdbws.com/. Accessed 8 Apr 2019
The hybrid indexes implementation. https://github.com/blankde/Learning-Postgres
The TPC-H Benchmark, http://www.tpc.org/tpch/. Accessed 8 Apr 2019
Yu, J., Sarwat, M.: Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems. Proc. VLDB Endowment 10(4), 385–396 (2016)
Zhang, H., Andersen, D.G., Pavlo, A., et al.: Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1567–1581. ACM (2016)
Li, Y., Wen, Y., Yuan, X.: Online aggregation: a review. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 103–114. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_10
Acknowledgment
This work was supported by National Key R&D Program of China (No. 2017YFC0803700), NSFC grants (No. 61532021), Shanghai Knowledge Service Platform Project (No. ZF1213) and SHEITC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Qu, W., Wang, X., Li, J., Li, X. (2019). Hybrid Indexes by Exploring Traditional B-Tree and Linear Regression. In: Ni, W., Wang, X., Song, W., Li, Y. (eds) Web Information Systems and Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11817. Springer, Cham. https://doi.org/10.1007/978-3-030-30952-7_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-30952-7_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30951-0
Online ISBN: 978-3-030-30952-7
eBook Packages: Computer ScienceComputer Science (R0)