Skip to main content

Advertisement

Log in

Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Range query processing is of vital importance in array management area. How to achieve efficient range query evaluation is challenging on sparse multidimensional data in many applications. The range query performance is seriously affected by the dimension order utilized, such that it is highly needed to optimize the dimension order for the query performance. Prior works only focus on optimizing the global dimension order for the data. However, the data distribution and the query distribution on different parts of data may differ with each other. The global dimension order is too coarse-grained to achieve good query performance. It is essential to develop a fine-grained dimension order optimization. In this paper, to exploit the optimizing opportunities of fine-grained dimension ordering for range query processing, we first design a two-level linearization method for storing and querying the sparse multidimensional data. Different from previous works which usually use a global dimension order, the two-level linearization method allows to separately specify the dimension orders for different parts of data, named chunks. To achieve the effect of the fine-grained dimension order optimization, we present the chunk-oriented dimension ordering problem for the first time, and propose the workload-driven dimension ordering algorithms for the uniform case and the non-uniform independent case respectively. Furthermore, to cope with the changing workload in practical applications, a dynamic dimension reordering method is designed to trace query trends in time and avoid query performance degradation. Finally, experiments are constructed on both synthetic and real-life data to illustrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Algorithm 1
Figure 5
Figure 6
Figure 7
Algorithm 2
Algorithm 3
Algorithm 4
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  1. Zhao, W., Rusu, F., Dong, B., Wu, K., Nugent, P.: Incremental view maintenance over array data. In: SIGMOD, pp. 139–154 (2017)

  2. Xing, H., Agrawal, G.: Accelerating array joining with integrated value-index. In: SSDBM, pp. 145–156 (2019)

  3. Choi, D., Park, C.-S., Chung, Y.D.: Progressive top-k subarray query processing in array databases. PVLDB 12(9), 989–1001 (2019)

    Google Scholar 

  4. Rodriges Zalipynis, R.A.: Bitfun: fast answers to queries with tunable functions in geospatial array dbms. PVLDB 13(12), 2909–2912 (2020)

    Google Scholar 

  5. Baunsgaard, S., Boehm, M., Chaudhary, A., Derakhshan, B., Geißelsöder, S., Grulich, P.M., Hildebrand, M., Innerebner, K., Markl, V., Neubauer, C., et al.: Exdra: Exploratory data science on federated raw data. In: SIGMOD, pp. 2450–2463 (2021)

  6. Guo, X., Li, T., Li, X., Zhao, H., Wang, S., Pang, C.: An efficient multidimensional \(l_{\infty }\) wavelet method and its application to approximate query processing. World Wide Web 24(1), 105–133 (2021)

  7. Song, X., Li, J., Tang, Y., Zhao, T., Chen, Y., Guan, Z.: Jkt: a joint graph convolutional network based deep knowledge tracing. Inform. Sci. 580, 510–523 (2021)

    Article  MathSciNet  Google Scholar 

  8. Song, X., Li, J., Lei, Q., Zhao, W., Chen, Y., Mian, A.: Bi-clkt: Bi-graph contrastive learning based knowledge tracing. Knowl.-Based Syst. 241, 108274 (2022)

    Article  Google Scholar 

  9. Mitra, S., Banerjee, S., Naskar, M.K.: Remodelling correlation: a fault resilient technique of correlation sensitive stochastic designs. Array 15, 100219 (2022)

    Article  Google Scholar 

  10. Fu, X., Miao, X., Xu, J., Gao, Y.: Continuous range-based skyline queries in road networks. World Wide Web 20(6), 1443–1467 (2017)

    Article  Google Scholar 

  11. Yin, H., Gao, H., Wang, B., Li, S., Li, J.: Efficient trajectory compression and range query processing. World Wide Web 25(3), 1259–1285 (2022)

    Article  Google Scholar 

  12. Haldar, N.A.H., Li, J., Ali, M.E., Cai, T., Chen, Y., Sellis, T., Reynolds, M.: Top-k socio-spatial co-engaged location selection for social users. TKDE (2022)

  13. Zhou, W., Zhang, H.: Correlation range query for effective recommendations. World Wide Web 18(3), 709–729 (2015)

    Article  Google Scholar 

  14. Wang, Y., Meliou, A., Miklau, G.: Rc-index: Diversifying answers to range queries. Proceedings of the VLDB Endowment 11(7), 773–786 (2018)

    Article  Google Scholar 

  15. Cui, N., Yang, X., Wang, B., Geng, J., Li, J.: Secure range query over encrypted data in outsourced environments. World Wide Web 23(1), 491–517 (2020)

    Article  Google Scholar 

  16. Nagarkar, P., Candan, K.S., Bhat, A.: Compressed spatial hierarchical bitmap (cshb) indexes for efficiently processing spatial range query workloads. Proceedings of the VLDB Endowment 8(12), 1382–1393 (2015)

    Article  Google Scholar 

  17. Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)

  18. Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: ICDE, pp 328–336. IEEE (1994)

  19. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)

  20. Bian, H., Yan, Y., Tao, W., Chen, L.J., Chen, Y., Du, X., Moscibroda, T.: Wide table layout optimization based on column ordering and duplication. In: SIGMOD, pp. 299–314 (2017)

  21. Marathe, A.P., Salem, K.: Query processing techniques for arrays. VLDBJ 11(1), 68–91 (2002)

    Article  Google Scholar 

  22. Papadopoulos, S., Datta, K., Madden, S., Mattson, T.: The tiledb array data storage manager. PVLDB 10(4), 349–360 (2016)

    Google Scholar 

  23. Rodriges Zalipynis, R.A.: Chronosdb: distributed, file based, geospatial array dbms. PVLDB 11(10), 1247–1261 (2018)

    Google Scholar 

  24. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system rasdaman. In: SIGMOD, pp. 575–577 (1998)

  25. Van Ballegooij, A.R.: Ram: a multidimensional array dbms. In: EDBT, pp 154–165. Springer (2004)

  26. Cornacchia, R., Héman, S., Zukowski, M., de Vries, A.P., Boncz, P.: Flexible and efficient ir using array databases. VLDBJ 17(1), 151–168 (2008)

    Article  Google Scholar 

  27. Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: SIGMOD, pp 963–968. ACM (2010)

  28. Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of Scidb. In: SSDBM, pp. 1–16 (2011)

  29. Soroush, E., Balazinska, M., Wang, D.: Arraystore: A storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)

  30. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  31. Robinson, J.T.: The Kdb-Tree: A search structure for large multidimensional dynamic indexes. In: SIGMOD, pp. 10–18 (1981)

  32. Samet, H.: The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR) 16(2), 187–260 (1984)

    Article  MathSciNet  Google Scholar 

  33. Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: An adaptable, symmetric multikey file structure. TODS 9(1), 38–71 (1984)

    Article  Google Scholar 

  34. Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, vol. 98, pp. 194–205 (1998)

  35. Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the ub-tree into a database system kernel. In: VLDB, vol. 2000, pp 263–272. Citeseer (2000)

  36. Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The nd-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In: Proceedings 2003 VLDB Conference, pp 620–631. Elsevier (2003)

  37. Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces. TOIS 24 (1), 79–110 (2006)

    Article  Google Scholar 

  38. Chen, C., Pramanik, S., Zhu, Q., Alok, W., Qian, G.: The C-Nd Tree: A multidimensional index for hybrid continuous and non-ordered discrete data spaces. In: EDBT, pp. 462–471 (2009)

  39. Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

  40. Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC, p 476. IEEE (1995)

  41. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)

    Article  Google Scholar 

  42. Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. DKE 69(1), 3–28 (2010)

    Article  Google Scholar 

  43. Colantonio, A., Di Pietro, R.: Concise: Compressed ncomposable integer set. IPL (Information Processing Letters) 110(16), 644–650 (2010)

    Article  MATH  Google Scholar 

  44. Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: ICDE, pp 484–495. IEEE (2014)

  45. Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with roaring bitmaps. Software: Practice and Experience 46(5), 709–719 (2016)

    Google Scholar 

  46. Zuo, W., Hou, X.: An improved probability propagation algorithm for density peak clustering based on natural nearest neighborhood. Array 100232 (2022)

  47. Hoya, T.: Reducing the number of centers in a probabilistic neural network via applying the first neighbor means clustering algorithm. Array 14, 100161 (2022)

    Article  Google Scholar 

  48. Alshammari, M., Stavrakakis, J., Takatsuka, M.: A parameter-free graph reduction for spectral clustering and spectralnet. Array 100192 (2022)

  49. Yuan, C., Zhu, Y., Zhong, Z., Zheng, W., Zhu, X.: Robust self-tuning multi-view clustering. World Wide Web 25(2), 489–512 (2022)

    Article  Google Scholar 

  50. Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the cloud. In: International Andrei Ershov Memorial Conference on Perspectives of System Informatics, pp 337–351. Springer (2017)

  51. Johnson, D., Krishnan, S., Chhugani, J., Kumar, S., Venkatasubramanian, S.: Compressing large boolean matrices using reordering techniques. In: PVLDB, pp. 13–23 (2004)

  52. Lemire, D., Kaser, O.: Reordering columns for smaller indexes. Inform. Sci. 181(12), 2550–2570 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  53. Pourabbas, E., Shoshani, A., Wu, K.: Minimizing index size by reordering rows and columns. In: SSDBM, pp 467–484. Springer (2012)

  54. Shi, J.: Column partition and permutation for run length encoding in columnar databases. In: SIGMOD, pp. 2873–2874 (2020)

  55. Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C., et al.: Synopses for massive data: samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4(1–3), 1–294 (2011)

    MATH  Google Scholar 

  56. Li, J., Rotem, D., Srivastava, J.: Aggregation algorithms for very large compressed data warehouses. PVLDB 99, 651–662 (1999)

    Google Scholar 

  57. Otoo, E.J., Rotem, D., Seshadri, S.: Optimal chunking of large multidimensional arrays for data warehousing. In: DOLAP, pp 25–32. ACM (2007)

  58. Nishimura, S., Yokota, H.: Quilts: Multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: SIGMOD, pp. 1525–1537 (2017)

  59. Bader, M.: Space-filling Curves: an Introduction with Applications in Scientific Computing, vol. 9. Springer, Berlin (2012)

    Google Scholar 

  60. Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. arXiv:1302.0103 (2013)

  61. Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.: Titan: A high-performance remote-sensing database. In: ICDE, pp 375–384. IEEE (1997)

  62. Hartmanis, J.: Computers and intractability: a guide to the theory of np-completeness. Siam Review 24(1), 90 (1982)

    Article  MathSciNet  Google Scholar 

  63. Guard, U.C.: Vessel Traffic Data. https://marinecadastre.gov/ais/ (2020)

Download references

Acknowledgements

This work is supported by the Key Program of the National Natural Science Foundation of China under grant No.61832003, the Major Program of the National Natural Science Foundation of China under grant No.U1811461.

Funding

This work is supported by the Key Program of the National Natural Science Foundation of China under grant No.61832003, the Major Program of the National Natural Science Foundation of China under grant No.U1811461.

Author information

Authors and Affiliations

Authors

Contributions

Shuai Han wrote the main manuscript text, Xianmin Liu and Jianzhong Li gave suggestions to revise the paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jianzhong Li.

Ethics declarations

Conflict of interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, S., Liu, X. & Li, J. Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data. World Wide Web 26, 1395–1433 (2023). https://doi.org/10.1007/s11280-022-01098-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01098-z

Keywords