Skip to main content
Log in

HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex’s multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  1. Xiong, Q., Zhang, X., Liu, W., et al.: An efficient row key encoding method with ASCII code for storing geospatial big data in HBase. ISPRS Int. J. Geo Inf. 9(11), 1–17 (2020)

    Article  Google Scholar 

  2. Bjeladinovic, S., Marjanovic, Z., Babarogic, S.: A proposal of architecture for integration and uniform use of hybrid SQL/NoSQL database components. J. Syst. Softw. 168(110633), 1–29 (2020)

    Google Scholar 

  3. Davoudian, A., Liu, M.: Big data systems: a software engineering perspective. ACM Comput. Surv. (CSUR) 53(5), 1–39 (2020)

    Article  Google Scholar 

  4. Chen, X., Wu, J., Yuan, G.: Research on the construction of spatio-temporal information cloud platform for big data. Geomat. Spat. Inf. Technol 43, 138–140 (2020)

    Google Scholar 

  5. Liu, Z., Chen, L., Yang, A., et al.: HiIndex: an efficient spatial index for rapid visualization of large-scale geographic vector data. ISPRS Int. J. Geo Inf. 10(10), 1–21 (2021)

    Article  Google Scholar 

  6. Kim, H.J., Ko, E.J., Jeon, Y.H., et al.: Techniques and guidelines for effective migration from RDBMS to NoSQL. J. Supercomput. 76(10), 7936–7950 (2020)

    Article  Google Scholar 

  7. Zou Z, Zheng L, Xia D, et al. “CSIndex: a coprocessor-based classified secondary index mechanism for efficient HBase query,” in 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). pp. 897–904, Xiamen, China, 2019.

  8. Zhang, J.: Spatio-temporal association query algorithm for massive video surveillance data in smart campus. IEEE Access 6, 59871–59880 (2018)

    Article  Google Scholar 

  9. Martinez-Mosquera, D., Navarrete, R., Lujan-Mora, S.: Modeling and management big data in databases—a systematic literature review. Sustainability 12(2), 1–41 (2020)

    Article  Google Scholar 

  10. Dhulavvagol, P.M., Bhajantri, V.H., Totad, S.G.: Performance analysis of distributed processing system using shard selection techniques on elasticsearch. Procedia Comput. Sci. 167, 1626–1635 (2020)

    Article  Google Scholar 

  11. Li-yi, Y.U.: Design and implementation of commodity pinyin search system based on solr. Comput. Telecommun. 1(7), 7–10 (2020)

    Google Scholar 

  12. Adams, B.: Chronotopic information interaction: integrating temporal and spatial structure for historical indexing and interactive search. Digital Scholarsh. Humanit. 36(3), 525–541 (2021)

    Article  Google Scholar 

  13. Song, J., He, H.Y., Thomas, R., et al.: Haery: a Hadoop based query system on accumulative and high-dimensional data model for big data. IEEE Trans. Knowl. Data Eng. 32(7), 1362–1377 (2019)

    Article  Google Scholar 

  14. Yang, W., Liu, L., Liu, Y., et al.: Secure and efficient multi-dimensional range query algorithm over TMWSNs. Ad Hoc Netw. 130(1), 1–12 (2022)

    Google Scholar 

  15. Fan, L., Liu, L., Gao, H., et al.: Secure K-Nearest neighbor queries in two-tiered mobile wireless sensor networks. Digital Commun. Netw. 7(2), 247–256 (2021)

    Article  Google Scholar 

  16. Xu, J., Tan, Y.: Optimization of multidimensional index query mechanism based on HBase. J. Comput. Appl. 40(2), 571–577 (2020)

    Google Scholar 

  17. Cao, J., Genton, M.G., Keyes, D.E., et al.: Hierarchical-block conditioning approximations for high-dimensional multivariate normal probabilities. Stat. Comput. 29(3), 585–598 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  18. Kumar, A., Pharwaha, A.P.S.: Development of a modified Hilbert curve fractal antenna for multiband applications. IETE J. Res. (2020). https://doi.org/10.1080/03772063.2020.1772126

    Article  Google Scholar 

  19. Qin, J., Ma, L., et al.: THBase: a coprocessor-based scheme for big trajectory data management. Future Internet 11(1), 1–17 (2019)

    Article  MathSciNet  Google Scholar 

  20. Moussa, A.M.: KD-tree based algorithm for copy-move forgery detection. Int. J. Sci. Technol. Res. 9(3), 6973–6977 (2020)

    Google Scholar 

  21. Antoniotti, L., Caldarola, F., Maiolo, M.: Infinite numerical computing applied to Hilbert’s, Peano’s, and Moore’s curves. Mediterr. J. Math. 17(99), 1–19 (2020)

    MathSciNet  MATH  Google Scholar 

  22. Goyal, P., Challa, J.S., Kumar, D., et al.: Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining. Int. J. Data Sci. Anal. 10(1), 25–47 (2020)

    Article  Google Scholar 

  23. Qi, J., Tao, Y., Chang, Y., et al.: Packing R-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. (TODS) 45(3), 1–47 (2020)

    Article  MathSciNet  Google Scholar 

  24. Hassan, M.U., Yaqoob, I., Zulfiqar, S., et al.: A comprehensive study of hbase storage architecture—a systematic literature review. Symmetry 13(1), 1–21 (2021)

    Article  Google Scholar 

  25. Zhou, J., Ben, J., Wang, R., et al.: Lattice quad-tree indexing algorithm for a hexagonal discrete global grid system. ISPRS Int. J. Geo Inf. 9(2), 1–16 (2020)

    Article  Google Scholar 

  26. Albert, M., Holmgren, C., Johansson, T., et al.: Embedding small digraphs and permutations in binary trees and split trees. Algorithmica 82(3), 589–615 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  27. Fellah, K., Kechar, B.: New approach based on Hilbert curve for energy efficient data collection in WSN with mobile sink. IET Wireless Sens. Syst. 10(5), 214–220 (2020)

    Article  Google Scholar 

  28. He, T., Tai, J., Shan, Y., et al.: A fast acoustic emission beamforming localization method based on Hilbert curve. Mech. Syst. Signal Process. 133(106291), 1–16 (2019)

    Google Scholar 

  29. Shahna, K.U., Mohamed, A.: A novel image encryption scheme using both pixel level and bit level permutation with chaotic map. Appl. Soft Comput. 90(106162), 1–17 (2020)

    Google Scholar 

  30. Alrayes, N., Hussein, M.I.: Metamaterial-based sensor design using split ring resonator and Hilbert fractal for biomedical application. Sens. Bio Sens. Res. 31(100395), 1–10 (2021)

    Google Scholar 

  31. Qin, J., Ma, L., Liu, Q.: DFTHR: a distributed framework for trajectory similarity query based on HBase and Redis. Information (Switzerland) 10(2), 1–24 (2019)

    Google Scholar 

  32. Vyas U, Panchal P, Patel M, et al. “STSDB: spatio-temporal sensor database for smart city query processing,” in Proceedings of the 20th International Conference on Distributed Computing and Networking, pp. 433–438, Gold Coast, Australia, 2019.

Download references

Funding

This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. BLX201923), National Natural Science Foundation of China (Grant No. 62072187), Guangdong Major Project of Basic and Applied Basic Research (Grant No. 2019B030302002), the Major Key Project of PCL (Grant No. PCL2021A09), Guangdong Marine Economic Development Special Fund Project (Grant No. GDNRC[2022]17), and Guangzhou Development Zone Science and Technology Project (Grant No. 2021GH10, 2020GH10).

Author information

Authors and Affiliations

Authors

Contributions

Xinyang Wang, Yu Sun, and Qiao Sun wrote the main manuscript text; Weiwei Lin, and James Z. Wang revised the manuscript and provided revision suggestions; Yu Sun and Wei Li wrote code and process the experiment data.

Corresponding authors

Correspondence to Qiao Sun or Weiwei Lin.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Research involving human and animal rights

This paper does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Sun, Y., Sun, Q. et al. HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems. Cluster Comput 26, 2011–2025 (2023). https://doi.org/10.1007/s10586-022-03723-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03723-y

Keywords

Navigation