Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data

Han, Shuai; Liu, Xianmin; Li, Jianzhong

doi:10.1007/s11280-022-01098-z

Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data

Published: 09 September 2022

Volume 26, pages 1395–1433, (2023)
Cite this article

World Wide Web Aims and scope Submit manuscript

Shuai Han¹,
Xianmin Liu¹ &
Jianzhong Li¹

399 Accesses
1 Altmetric
Explore all metrics

Abstract

Range query processing is of vital importance in array management area. How to achieve efficient range query evaluation is challenging on sparse multidimensional data in many applications. The range query performance is seriously affected by the dimension order utilized, such that it is highly needed to optimize the dimension order for the query performance. Prior works only focus on optimizing the global dimension order for the data. However, the data distribution and the query distribution on different parts of data may differ with each other. The global dimension order is too coarse-grained to achieve good query performance. It is essential to develop a fine-grained dimension order optimization. In this paper, to exploit the optimizing opportunities of fine-grained dimension ordering for range query processing, we first design a two-level linearization method for storing and querying the sparse multidimensional data. Different from previous works which usually use a global dimension order, the two-level linearization method allows to separately specify the dimension orders for different parts of data, named chunks. To achieve the effect of the fine-grained dimension order optimization, we present the chunk-oriented dimension ordering problem for the first time, and propose the workload-driven dimension ordering algorithms for the uniform case and the non-uniform independent case respectively. Furthermore, to cope with the changing workload in practical applications, a dynamic dimension reordering method is designed to trace query trends in time and avoid query performance degradation. Finally, experiments are constructed on both synthetic and real-life data to illustrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WALK: A Workload-Aware Learned Kd-Tree

Correlation-aware partitioning for skewed range query optimization

Article 17 March 2018

Multidimensional query processing algorithm by dimension transformation

Article Open access 11 April 2023

References

Zhao, W., Rusu, F., Dong, B., Wu, K., Nugent, P.: Incremental view maintenance over array data. In: SIGMOD, pp. 139–154 (2017)
Xing, H., Agrawal, G.: Accelerating array joining with integrated value-index. In: SSDBM, pp. 145–156 (2019)
Choi, D., Park, C.-S., Chung, Y.D.: Progressive top-k subarray query processing in array databases. PVLDB 12(9), 989–1001 (2019)
Google Scholar
Rodriges Zalipynis, R.A.: Bitfun: fast answers to queries with tunable functions in geospatial array dbms. PVLDB 13(12), 2909–2912 (2020)
Google Scholar
Baunsgaard, S., Boehm, M., Chaudhary, A., Derakhshan, B., Geißelsöder, S., Grulich, P.M., Hildebrand, M., Innerebner, K., Markl, V., Neubauer, C., et al.: Exdra: Exploratory data science on federated raw data. In: SIGMOD, pp. 2450–2463 (2021)
Guo, X., Li, T., Li, X., Zhao, H., Wang, S., Pang, C.: An efficient multidimensional $l_{\infty }$ wavelet method and its application to approximate query processing. World Wide Web 24(1), 105–133 (2021)
Song, X., Li, J., Tang, Y., Zhao, T., Chen, Y., Guan, Z.: Jkt: a joint graph convolutional network based deep knowledge tracing. Inform. Sci. 580, 510–523 (2021)
Article MathSciNet Google Scholar
Song, X., Li, J., Lei, Q., Zhao, W., Chen, Y., Mian, A.: Bi-clkt: Bi-graph contrastive learning based knowledge tracing. Knowl.-Based Syst. 241, 108274 (2022)
Article Google Scholar
Mitra, S., Banerjee, S., Naskar, M.K.: Remodelling correlation: a fault resilient technique of correlation sensitive stochastic designs. Array 15, 100219 (2022)
Article Google Scholar
Fu, X., Miao, X., Xu, J., Gao, Y.: Continuous range-based skyline queries in road networks. World Wide Web 20(6), 1443–1467 (2017)
Article Google Scholar
Yin, H., Gao, H., Wang, B., Li, S., Li, J.: Efficient trajectory compression and range query processing. World Wide Web 25(3), 1259–1285 (2022)
Article Google Scholar
Haldar, N.A.H., Li, J., Ali, M.E., Cai, T., Chen, Y., Sellis, T., Reynolds, M.: Top-k socio-spatial co-engaged location selection for social users. TKDE (2022)
Zhou, W., Zhang, H.: Correlation range query for effective recommendations. World Wide Web 18(3), 709–729 (2015)
Article Google Scholar
Wang, Y., Meliou, A., Miklau, G.: Rc-index: Diversifying answers to range queries. Proceedings of the VLDB Endowment 11(7), 773–786 (2018)
Article Google Scholar
Cui, N., Yang, X., Wang, B., Geng, J., Li, J.: Secure range query over encrypted data in outsourced environments. World Wide Web 23(1), 491–517 (2020)
Article Google Scholar
Nagarkar, P., Candan, K.S., Bhat, A.: Compressed spatial hierarchical bitmap (cshb) indexes for efficiently processing spatial range query workloads. Proceedings of the VLDB Endowment 8(12), 1382–1393 (2015)
Article Google Scholar
Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)
Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: ICDE, pp 328–336. IEEE (1994)
Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)
Bian, H., Yan, Y., Tao, W., Chen, L.J., Chen, Y., Du, X., Moscibroda, T.: Wide table layout optimization based on column ordering and duplication. In: SIGMOD, pp. 299–314 (2017)
Marathe, A.P., Salem, K.: Query processing techniques for arrays. VLDBJ 11(1), 68–91 (2002)
Article Google Scholar
Papadopoulos, S., Datta, K., Madden, S., Mattson, T.: The tiledb array data storage manager. PVLDB 10(4), 349–360 (2016)
Google Scholar
Rodriges Zalipynis, R.A.: Chronosdb: distributed, file based, geospatial array dbms. PVLDB 11(10), 1247–1261 (2018)
Google Scholar
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system rasdaman. In: SIGMOD, pp. 575–577 (1998)
Van Ballegooij, A.R.: Ram: a multidimensional array dbms. In: EDBT, pp 154–165. Springer (2004)
Cornacchia, R., Héman, S., Zukowski, M., de Vries, A.P., Boncz, P.: Flexible and efficient ir using array databases. VLDBJ 17(1), 151–168 (2008)
Article Google Scholar
Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: SIGMOD, pp 963–968. ACM (2010)
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of Scidb. In: SSDBM, pp. 1–16 (2011)
Soroush, E., Balazinska, M., Wang, D.: Arraystore: A storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MATH Google Scholar
Robinson, J.T.: The Kdb-Tree: A search structure for large multidimensional dynamic indexes. In: SIGMOD, pp. 10–18 (1981)
Samet, H.: The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR) 16(2), 187–260 (1984)
Article MathSciNet Google Scholar
Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: An adaptable, symmetric multikey file structure. TODS 9(1), 38–71 (1984)
Article Google Scholar
Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, vol. 98, pp. 194–205 (1998)
Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the ub-tree into a database system kernel. In: VLDB, vol. 2000, pp 263–272. Citeseer (2000)
Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The nd-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In: Proceedings 2003 VLDB Conference, pp 620–631. Elsevier (2003)
Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces. TOIS 24 (1), 79–110 (2006)
Article Google Scholar
Chen, C., Pramanik, S., Zhu, Q., Alok, W., Qian, G.: The C-Nd Tree: A multidimensional index for hybrid continuous and non-ordered discrete data spaces. In: EDBT, pp. 462–471 (2009)
Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC, p 476. IEEE (1995)
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)
Article Google Scholar
Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. DKE 69(1), 3–28 (2010)
Article Google Scholar
Colantonio, A., Di Pietro, R.: Concise: Compressed ncomposable integer set. IPL (Information Processing Letters) 110(16), 644–650 (2010)
Article MATH Google Scholar
Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: ICDE, pp 484–495. IEEE (2014)
Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with roaring bitmaps. Software: Practice and Experience 46(5), 709–719 (2016)
Google Scholar
Zuo, W., Hou, X.: An improved probability propagation algorithm for density peak clustering based on natural nearest neighborhood. Array 100232 (2022)
Hoya, T.: Reducing the number of centers in a probabilistic neural network via applying the first neighbor means clustering algorithm. Array 14, 100161 (2022)
Article Google Scholar
Alshammari, M., Stavrakakis, J., Takatsuka, M.: A parameter-free graph reduction for spectral clustering and spectralnet. Array 100192 (2022)
Yuan, C., Zhu, Y., Zhong, Z., Zheng, W., Zhu, X.: Robust self-tuning multi-view clustering. World Wide Web 25(2), 489–512 (2022)
Article Google Scholar
Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the cloud. In: International Andrei Ershov Memorial Conference on Perspectives of System Informatics, pp 337–351. Springer (2017)
Johnson, D., Krishnan, S., Chhugani, J., Kumar, S., Venkatasubramanian, S.: Compressing large boolean matrices using reordering techniques. In: PVLDB, pp. 13–23 (2004)
Lemire, D., Kaser, O.: Reordering columns for smaller indexes. Inform. Sci. 181(12), 2550–2570 (2011)
Article MathSciNet MATH Google Scholar
Pourabbas, E., Shoshani, A., Wu, K.: Minimizing index size by reordering rows and columns. In: SSDBM, pp 467–484. Springer (2012)
Shi, J.: Column partition and permutation for run length encoding in columnar databases. In: SIGMOD, pp. 2873–2874 (2020)
Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C., et al.: Synopses for massive data: samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4(1–3), 1–294 (2011)
MATH Google Scholar
Li, J., Rotem, D., Srivastava, J.: Aggregation algorithms for very large compressed data warehouses. PVLDB 99, 651–662 (1999)
Google Scholar
Otoo, E.J., Rotem, D., Seshadri, S.: Optimal chunking of large multidimensional arrays for data warehousing. In: DOLAP, pp 25–32. ACM (2007)
Nishimura, S., Yokota, H.: Quilts: Multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: SIGMOD, pp. 1525–1537 (2017)
Bader, M.: Space-filling Curves: an Introduction with Applications in Scientific Computing, vol. 9. Springer, Berlin (2012)
Google Scholar
Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. arXiv:1302.0103 (2013)
Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.: Titan: A high-performance remote-sensing database. In: ICDE, pp 375–384. IEEE (1997)
Hartmanis, J.: Computers and intractability: a guide to the theory of np-completeness. Siam Review 24(1), 90 (1982)
Article MathSciNet Google Scholar
Guard, U.C.: Vessel Traffic Data. https://marinecadastre.gov/ais/ (2020)

Download references

Acknowledgements

This work is supported by the Key Program of the National Natural Science Foundation of China under grant No.61832003, the Major Program of the National Natural Science Foundation of China under grant No.U1811461.

Funding

This work is supported by the Key Program of the National Natural Science Foundation of China under grant No.61832003, the Major Program of the National Natural Science Foundation of China under grant No.U1811461.

Author information

Authors and Affiliations

Department of computing, Harbin Institute of Technology, Xidazhi Street, Harbin, 150001, Heilongjiang Province, China
Shuai Han, Xianmin Liu & Jianzhong Li

Authors

Shuai Han
View author publications
You can also search for this author inPubMed Google Scholar
Xianmin Liu
View author publications
You can also search for this author inPubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Shuai Han wrote the main manuscript text, Xianmin Liu and Jianzhong Li gave suggestions to revise the paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jianzhong Li.

Ethics declarations

Conflict of interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, S., Liu, X. & Li, J. Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data. World Wide Web 26, 1395–1433 (2023). https://doi.org/10.1007/s11280-022-01098-z

Download citation

Received: 09 May 2022
Revised: 20 August 2022
Accepted: 29 August 2022
Published: 09 September 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11280-022-01098-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

WALK: A Workload-Aware Learned Kd-Tree

Correlation-aware partitioning for skewed range query optimization

Multidimensional query processing algorithm by dimension transformation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now