Abstract
Range query processing is of vital importance in array management area. How to achieve efficient range query evaluation is challenging on sparse multidimensional data in many applications. The range query performance is seriously affected by the dimension order utilized, such that it is highly needed to optimize the dimension order for the query performance. Prior works only focus on optimizing the global dimension order for the data. However, the data distribution and the query distribution on different parts of data may differ with each other. The global dimension order is too coarse-grained to achieve good query performance. It is essential to develop a fine-grained dimension order optimization. In this paper, to exploit the optimizing opportunities of fine-grained dimension ordering for range query processing, we first design a two-level linearization method for storing and querying the sparse multidimensional data. Different from previous works which usually use a global dimension order, the two-level linearization method allows to separately specify the dimension orders for different parts of data, named chunks. To achieve the effect of the fine-grained dimension order optimization, we present the chunk-oriented dimension ordering problem for the first time, and propose the workload-driven dimension ordering algorithms for the uniform case and the non-uniform independent case respectively. Furthermore, to cope with the changing workload in practical applications, a dynamic dimension reordering method is designed to trace query trends in time and avoid query performance degradation. Finally, experiments are constructed on both synthetic and real-life data to illustrate the effectiveness of our method.















Similar content being viewed by others
References
Zhao, W., Rusu, F., Dong, B., Wu, K., Nugent, P.: Incremental view maintenance over array data. In: SIGMOD, pp. 139–154 (2017)
Xing, H., Agrawal, G.: Accelerating array joining with integrated value-index. In: SSDBM, pp. 145–156 (2019)
Choi, D., Park, C.-S., Chung, Y.D.: Progressive top-k subarray query processing in array databases. PVLDB 12(9), 989–1001 (2019)
Rodriges Zalipynis, R.A.: Bitfun: fast answers to queries with tunable functions in geospatial array dbms. PVLDB 13(12), 2909–2912 (2020)
Baunsgaard, S., Boehm, M., Chaudhary, A., Derakhshan, B., Geißelsöder, S., Grulich, P.M., Hildebrand, M., Innerebner, K., Markl, V., Neubauer, C., et al.: Exdra: Exploratory data science on federated raw data. In: SIGMOD, pp. 2450–2463 (2021)
Guo, X., Li, T., Li, X., Zhao, H., Wang, S., Pang, C.: An efficient multidimensional \(l_{\infty }\) wavelet method and its application to approximate query processing. World Wide Web 24(1), 105–133 (2021)
Song, X., Li, J., Tang, Y., Zhao, T., Chen, Y., Guan, Z.: Jkt: a joint graph convolutional network based deep knowledge tracing. Inform. Sci. 580, 510–523 (2021)
Song, X., Li, J., Lei, Q., Zhao, W., Chen, Y., Mian, A.: Bi-clkt: Bi-graph contrastive learning based knowledge tracing. Knowl.-Based Syst. 241, 108274 (2022)
Mitra, S., Banerjee, S., Naskar, M.K.: Remodelling correlation: a fault resilient technique of correlation sensitive stochastic designs. Array 15, 100219 (2022)
Fu, X., Miao, X., Xu, J., Gao, Y.: Continuous range-based skyline queries in road networks. World Wide Web 20(6), 1443–1467 (2017)
Yin, H., Gao, H., Wang, B., Li, S., Li, J.: Efficient trajectory compression and range query processing. World Wide Web 25(3), 1259–1285 (2022)
Haldar, N.A.H., Li, J., Ali, M.E., Cai, T., Chen, Y., Sellis, T., Reynolds, M.: Top-k socio-spatial co-engaged location selection for social users. TKDE (2022)
Zhou, W., Zhang, H.: Correlation range query for effective recommendations. World Wide Web 18(3), 709–729 (2015)
Wang, Y., Meliou, A., Miklau, G.: Rc-index: Diversifying answers to range queries. Proceedings of the VLDB Endowment 11(7), 773–786 (2018)
Cui, N., Yang, X., Wang, B., Geng, J., Li, J.: Secure range query over encrypted data in outsourced environments. World Wide Web 23(1), 491–517 (2020)
Nagarkar, P., Candan, K.S., Bhat, A.: Compressed spatial hierarchical bitmap (cshb) indexes for efficiently processing spatial range query workloads. Proceedings of the VLDB Endowment 8(12), 1382–1393 (2015)
Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)
Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: ICDE, pp 328–336. IEEE (1994)
Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)
Bian, H., Yan, Y., Tao, W., Chen, L.J., Chen, Y., Du, X., Moscibroda, T.: Wide table layout optimization based on column ordering and duplication. In: SIGMOD, pp. 299–314 (2017)
Marathe, A.P., Salem, K.: Query processing techniques for arrays. VLDBJ 11(1), 68–91 (2002)
Papadopoulos, S., Datta, K., Madden, S., Mattson, T.: The tiledb array data storage manager. PVLDB 10(4), 349–360 (2016)
Rodriges Zalipynis, R.A.: Chronosdb: distributed, file based, geospatial array dbms. PVLDB 11(10), 1247–1261 (2018)
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system rasdaman. In: SIGMOD, pp. 575–577 (1998)
Van Ballegooij, A.R.: Ram: a multidimensional array dbms. In: EDBT, pp 154–165. Springer (2004)
Cornacchia, R., Héman, S., Zukowski, M., de Vries, A.P., Boncz, P.: Flexible and efficient ir using array databases. VLDBJ 17(1), 151–168 (2008)
Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: SIGMOD, pp 963–968. ACM (2010)
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of Scidb. In: SSDBM, pp. 1–16 (2011)
Soroush, E., Balazinska, M., Wang, D.: Arraystore: A storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Robinson, J.T.: The Kdb-Tree: A search structure for large multidimensional dynamic indexes. In: SIGMOD, pp. 10–18 (1981)
Samet, H.: The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR) 16(2), 187–260 (1984)
Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: An adaptable, symmetric multikey file structure. TODS 9(1), 38–71 (1984)
Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, vol. 98, pp. 194–205 (1998)
Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the ub-tree into a database system kernel. In: VLDB, vol. 2000, pp 263–272. Citeseer (2000)
Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The nd-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In: Proceedings 2003 VLDB Conference, pp 620–631. Elsevier (2003)
Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces. TOIS 24 (1), 79–110 (2006)
Chen, C., Pramanik, S., Zhu, Q., Alok, W., Qian, G.: The C-Nd Tree: A multidimensional index for hybrid continuous and non-ordered discrete data spaces. In: EDBT, pp. 462–471 (2009)
Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC, p 476. IEEE (1995)
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)
Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. DKE 69(1), 3–28 (2010)
Colantonio, A., Di Pietro, R.: Concise: Compressed ncomposable integer set. IPL (Information Processing Letters) 110(16), 644–650 (2010)
Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: ICDE, pp 484–495. IEEE (2014)
Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with roaring bitmaps. Software: Practice and Experience 46(5), 709–719 (2016)
Zuo, W., Hou, X.: An improved probability propagation algorithm for density peak clustering based on natural nearest neighborhood. Array 100232 (2022)
Hoya, T.: Reducing the number of centers in a probabilistic neural network via applying the first neighbor means clustering algorithm. Array 14, 100161 (2022)
Alshammari, M., Stavrakakis, J., Takatsuka, M.: A parameter-free graph reduction for spectral clustering and spectralnet. Array 100192 (2022)
Yuan, C., Zhu, Y., Zhong, Z., Zheng, W., Zhu, X.: Robust self-tuning multi-view clustering. World Wide Web 25(2), 489–512 (2022)
Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the cloud. In: International Andrei Ershov Memorial Conference on Perspectives of System Informatics, pp 337–351. Springer (2017)
Johnson, D., Krishnan, S., Chhugani, J., Kumar, S., Venkatasubramanian, S.: Compressing large boolean matrices using reordering techniques. In: PVLDB, pp. 13–23 (2004)
Lemire, D., Kaser, O.: Reordering columns for smaller indexes. Inform. Sci. 181(12), 2550–2570 (2011)
Pourabbas, E., Shoshani, A., Wu, K.: Minimizing index size by reordering rows and columns. In: SSDBM, pp 467–484. Springer (2012)
Shi, J.: Column partition and permutation for run length encoding in columnar databases. In: SIGMOD, pp. 2873–2874 (2020)
Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C., et al.: Synopses for massive data: samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4(1–3), 1–294 (2011)
Li, J., Rotem, D., Srivastava, J.: Aggregation algorithms for very large compressed data warehouses. PVLDB 99, 651–662 (1999)
Otoo, E.J., Rotem, D., Seshadri, S.: Optimal chunking of large multidimensional arrays for data warehousing. In: DOLAP, pp 25–32. ACM (2007)
Nishimura, S., Yokota, H.: Quilts: Multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: SIGMOD, pp. 1525–1537 (2017)
Bader, M.: Space-filling Curves: an Introduction with Applications in Scientific Computing, vol. 9. Springer, Berlin (2012)
Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. arXiv:1302.0103 (2013)
Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.: Titan: A high-performance remote-sensing database. In: ICDE, pp 375–384. IEEE (1997)
Hartmanis, J.: Computers and intractability: a guide to the theory of np-completeness. Siam Review 24(1), 90 (1982)
Guard, U.C.: Vessel Traffic Data. https://marinecadastre.gov/ais/ (2020)
Acknowledgements
This work is supported by the Key Program of the National Natural Science Foundation of China under grant No.61832003, the Major Program of the National Natural Science Foundation of China under grant No.U1811461.
Funding
This work is supported by the Key Program of the National Natural Science Foundation of China under grant No.61832003, the Major Program of the National Natural Science Foundation of China under grant No.U1811461.
Author information
Authors and Affiliations
Contributions
Shuai Han wrote the main manuscript text, Xianmin Liu and Jianzhong Li gave suggestions to revise the paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, S., Liu, X. & Li, J. Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data. World Wide Web 26, 1395–1433 (2023). https://doi.org/10.1007/s11280-022-01098-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01098-z