Skip to main content
Log in

HBase storage schemas for massive spatial vector data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With the development of Geographic Information System (GIS), the storage requirement of spatial vector data is increasing dramatically. Nowadays, designing an efficient storage schema for massive spatial vector data becomes a key step for GIS. Cloud computing with NoSQL, such as HBase, can provide massive high-concurrent and scalable service for storage of spatial vector data. However, storage schemas in NoSQL for spatial vector data can be rarely seen. In this paper, two HBase storage schemas for spatial vector data are proposed. One is the storage schema with rowkeys based on Z curve, Z schema, and the other is the storage schema with rowkeys based on geometry objects identifiers, ID schema. In our experiments, the region query efficiency of the two storage schemas is tested on the cloud framework built by us. Different order Z curve and different query ranges are involved in the experiments. Experimental results show, for both schemas, the increase of query range leads to the growth of response time. More importantly, response time of Z schema is about one-fifth as long as that of ID schema in all cases. It can be seen that Z schema is a better solution for storing spatial vector data in HBase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ranjan, R., Wang, L., Zomaya, A.Y., Tao, J., Jayaraman, P.P., Georgakopoulos, D.: Advances in methods and techniques for processing streaming big data in datacentre clouds. IEEE Trans. Emerg. Top. Comput. 4(2), 262–265 (2016)

    Article  Google Scholar 

  2. Huang, F., Zhou, J., Tao, J., Tan, X., Liang, S., Cheng, J.: PMODTRAN: a parallel implementation based on MODTRAN for massive remote sensing data processing. Int, J. Digit. Earth 9, 819–834 (2016)

    Article  Google Scholar 

  3. Chen, D., Hu, Y., Wang, L., Zomaya, A.Y., Li, X.: H-PARAFAC: hierarchical parallel factor analysis of multidimensional big data. IEEE Trans. Parallel Distrib. Syst. 28(4), 1091–1104 (2017)

    Article  Google Scholar 

  4. Ranjan, R., Georgakopoulos, D., Wang, L.: A note on software tools and technologies for delivering smart media-optimized big data applications in the cloud. Computing 98(1–2), 1–5 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  5. Ranjan, R., Kolodziej, J., Wang, L., Zomaya, A.Y.: Cross-layer cloud resource configuration selection in the big data era. IEEE Cloud Comput. 2(3), 16–22 (2015)

    Article  Google Scholar 

  6. Ma, Y., Wang, L., Liu, P., Ranjan, R.: Towards building a data-intensive index for big data computing: a case study of remote sensing data processing. Inf. Sc. Int. J. 319(C), 171–188 (2015)

    Google Scholar 

  7. Deng, Z., Wu, X., Wang, L., Chen, X., Ranjan, R., Zomaya, A., Chen, D.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. 26(3), 834–846 (2015)

    Article  Google Scholar 

  8. Wang, L., Geng, H., Liu, P., Lu, K., Kolodziej, J., Ranjan, R., Zomaya, A.Y.: Particle swarm optimization based dictionary learning for remote sensing big data. Knowl-Based Syst. 79(C), 43–50 (2015)

    Article  Google Scholar 

  9. Wang, L., Lu, K., Liu, P., Ranjan, R., Chen, L.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)

    Article  Google Scholar 

  10. Dan, C., Li, X., Dong, C., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 33–43 (2014)

    Article  Google Scholar 

  11. Chen, D., Li, D., Xiong, M., Bao, H., Li, X.: GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans. Inf. Technol. Biomed. 14(6), 1417 (2010)

    Article  Google Scholar 

  12. Wang, Y., Liu, Z., Liao, H., Li, C.: Improving the performance of GIS polygon overlay computation with MapReduce for spatial big data processing. Clust. Comput. 18(2), 507–516 (2015)

    Article  Google Scholar 

  13. Chen, Y., Li, F., Fan, J.: Mining association rules in big data with NGEP. Clust. Comput. 18(2), 577–585 (2015)

    Article  Google Scholar 

  14. He, Z., Wu, C., Liu, G., Zheng, Z., Tian, Y.: Decomposition tree: a spatio-temporal indexing method for movement big data. Clust. Comput. 18(4), 1481–1492 (2015)

    Article  Google Scholar 

  15. Zhao, J., Wang, L., Jie, T., Chen, J., Sun, W., Ranjan, R., Kołodziej, J., Streit, A., Georgakopoulos, D.: A security framework in G-Hadoop for big data computing across distributed Cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  16. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  17. Wang, L., Song, W., Liu, P.: Link the remote sensing big data to the image features via wavelet transformation. Clust. Comput. 19(2), 793–810 (2016)

    Article  Google Scholar 

  18. Plaza, A.J., Chang, C.I.: High Performance Computing in Remote Sensing. Chapman & Hall/CRC, Boca Raton (2008)

    Google Scholar 

  19. Ma, Y., Wu, H., Wang, L., Huang, B., Ranjan, R., Zomaya, A., Jie, W.: Remote sensing big data computing: challenges and opportunities. Future Gener. Comput. Syst. 51, 47–60 (2015)

    Article  Google Scholar 

  20. Habib, S., Morozov, V., Frontiere, N., Finkel, H., Pope, A., Heitmann, K.: HACC: extreme scaling and performance across diverse architectures. IEEE (2013)

  21. Sadiku, M.N.O., Musa, S.M., Momoh, O.D.: Cloud computing: opportunities and challenges. Potentials IEEE 33(1), 34–36 (2014)

    Article  Google Scholar 

  22. Karun, A.K., Chitharanjan, K.: A review on hadoop—HDFS infrastructure extensions. Inf. Commun. Technol. 2013, 132–137 (2013)

    Google Scholar 

  23. Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H.: Others: the claremont report on database research. ACM Sigmod Record 37(3), 9–19 (2008)

    Article  Google Scholar 

  24. Konstantinou, I., Angelou, E., Boumpouka, C., Tsoumakos, D., Koziris, N.: On the elasticity of NoSQL databases over cloud management platforms, pp. 2385–2388 (2011)

  25. Cattell, R.: Scalable SQL and NoSQL data stores. Acm Sigmod Record 39(4), 12–27 (2011)

    Article  Google Scholar 

  26. George, L.: HBase: the definitive guide: random access to your planet-size data. O’Reilly Media, Inc, California (2011)

    Google Scholar 

  27. Welcome to \(\text{Apache}^{\rm TM}\) Hadoop® !: Welcome to \(\text{ Apache }^{\rm TM}\) Hadoop ® !. http://hadoop.apache.org/ (2017). Accessed 2017/8/1 2017

  28. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc, California (2012)

    Google Scholar 

  29. Vora, M.N.: Hadoop-HBase for large-scale data. In: International Conference on Computer Science and Network Technology, pp. 601–605. (2011)

  30. Kim, D.J., Shin, J.H., Hong, K.S.: Scalable RDF store based on HBase and MapReduce. In: International Conference on Advanced Computer Theory and Engineering, pp. V1–V633. (2010)

  31. Cryans, J., April, A., Abran, A.: Criteria to Compare Cloud Computing with Current Database Technology Software Process and Product Measurement, pp. 114–126. Springer, New York (2008)

    Google Scholar 

  32. Lam, C.: Hadoop in Action. Manning Publications Co., Greenwich (2010)

    Google Scholar 

  33. Space-filling curve - Wikipedia: Space-filling curve - Wikipedia. https://en.wikipedia.org/wiki/Space-filling_curve (2017). Accessed 2017/8/3 2017

  34. Fu, Z.L.S.T.: Distributed spatial index based onmultilevel R-tree. Bull. Surv. Mapp. 11, 42–46p. (2012)

    Google Scholar 

  35. Li, X., Zheng, W.: Parallel Spatial Index Algorithm Based on Hilbert Partition. In: International Conference on Computational and Information Sciences, pp. 876–879 (2013)

  36. Zhong, Y., Han, J., Zhang, T., Li, Z., Fang, J., Chen, G.: Towards Parallel Spatial Query Processing for Big Spatial Data. In: Parallel and Distributed Processing Symposium Workshops&Phd Forum, pp. 2085–2094. (2012)

  37. Wang, L., Chen, B., Liu, Y.: Distributed storage and index of vector spatial data based on HBase. In: International Conference on Geoinformatics, pp. 1–5 (2013)

  38. Kim, J., Hong, S., Nam, B.: A Performance Study of Traversing Spatial Indexing Structures in Parallel on GPU. In: IEEE International Conference on High PERFORMANCE Computing and Communication&2012 IEEE International Conference on Embedded Software and Systems, pp. 855–860 (2012)

  39. Wei, L., Hsu, Y., Peng, W., Lee, W.: Indexing spatial data in cloud data managements. Pervasive Mob. Comput. 15, 48–61 (2014)

    Article  Google Scholar 

  40. Deng, Z., Hu, Y., Zhu, M., Huang, X., Du, B.: A scalable and fast OPTICS for clustering trajectory big data. Clust. Comput. 18(2), 549–562 (2015)

    Article  Google Scholar 

  41. Han, D., Stroulia, E.: HGrid: A Data Model for Large Geospatial Data Sets in HBase. In: IEEE Sixth International Conference on Cloud Computing, pp. 910–917 (2013)

  42. Zhang, N., Zheng, G., Chen, H., Chen, J., Chen, X.: HBaseSpatial: A Scalable Spatial Data Storage Based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 644–651 (2014)

  43. Chen, D., Hu, Y., Cai, C., Zeng, K., Li, X.: Brain big data processing with massively parallel computing technology: challenges and opportunities. Softw. Pract. Exp. 47(3), 405–420 (2017)

    Article  Google Scholar 

  44. Chen, D., Li, X., Wang, L., Khan, S.U., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Comput. 64(3), 707–719 (2015)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Li, C., Li, M. et al. HBase storage schemas for massive spatial vector data. Cluster Comput 20, 3657–3666 (2017). https://doi.org/10.1007/s10586-017-1253-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1253-1

Keywords

Navigation