Skip to main content
Log in

Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Lu, F., Zhang, H.: Big data and generalized GIS. Geomat. Inf. Sci. Wuhan Univ. 39(6), 645–654 (2014)

  2. Zhang, X., Song, W., Liu, L.: An implementation approach to store GIS spatial data on NoSQL database. In: Hu, S., Ye, X. (eds.) International Conference on Geoinformatics (2014)

  3. Le, H.V., Takasu, A.: An Efficient Distributed Index for Geospatial Databases, pp. 28–42. Springer, Heidelberg (2015)

    Google Scholar 

  4. Alvanaki, F., et al.: GIS navigation boosted by column stores. Proc. Vldb Endow. 8(12), 1956–1959 (2015)

    Article  Google Scholar 

  5. Zhang, N., et al. HBaseSpatial: a scalable spatial data storage based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2014)

  6. Nishimura, S., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2SI), 289–319 (2013)

    Article  Google Scholar 

  7. Chen, Z., et al.: Hybrid Range Consistent Hash Partitioning Strategy—A New Data Partition Strategy for NoSQL Database, pp. 1161–1169. IEEE, New York (2013)

    Google Scholar 

  8. Qi, W., Song, J., Bao, Y.B.: Near-uniform range partition approach for increased partitioning in large database. In: IEEE International Conference on Information Management and Service (IMS) (2010)

  9. Kumar, A., Yadav, J.S.: A review on partitioning techniques. Database 35(3), 342–347342 (2014)

    Google Scholar 

  10. George, L.: HBase schema design—things you need to know—O’Reilly Media Free. Live Events (2017)

  11. Chang, F., et al.: Bigtable: a distributed storage system for structured data, pp. 205–218. USENIX Association, Berkeley (2006)

    Google Scholar 

  12. IBM. https://en.wikipedia.org/wiki/Dynamo_(storage_system)

  13. Cassandra. https://cassandra.apache.org/doc/latest/

  14. Akdogan, A., et al.: Cost-efficient partitioning of spatial data on cloud. In: International Conference on Big Data (2015)

  15. Xia, C., Wang, T.: Cached Index of HBase based on coprocessor. In: International Conference on Computer Science and Communication Engineering (CSCE 2015), pp. 123–129 (2015)

  16. Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of IEEE International Conference on Computer Science & Software Engineering (2015)

  17. Zhuang, H., et al.: Design of a more scalable database system. In: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, pp. 1213–1216. IEEE, New York (2015)

  18. Zhong, Y., Liu, D.: The application of K-means clustering algorithm based on Hadoop. In: Proceedings of 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016), pp. 88–92 (2016)

  19. George, L.: HBase The Definitive Guide. O’Reilly Media, Newton (2011)

    Google Scholar 

  20. Cruz, F., et al.: Workload-Aware Table Splitting for NoSQL, pp. 399–404. Aurora Construction Materials, Rockbank (2014)

    Google Scholar 

  21. Ye, Z., Li, S.: A request skew aware heterogeneous distributed storage system based on Cassandra. In: International Conference on Computer and Management (2011)

  22. Elghamrawy, S.M.: An adaptive load-balanced partitioning module in Cassandra using rendezvous hashing. In: International Conference on Advanced Intelligent Systems and Information (2016)

  23. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Vldb Endow. 8(12), 1602–1605 (2015)

    Article  Google Scholar 

  24. Han, D., Stroulia, E.: HGrid: a data model for large geospatial data sets in HBase. In: IEEE Sixth International Conference on Cloud Computing (2013)

  25. Fox, A., et al.: Spatio-temporal Indexing in Non-relational Distributed Databases. IEEE, New York (2013)

    Book  Google Scholar 

  26. Hughes, J.N., et al.: A survey of techniques and open-source tools for processing streams of spatio-temporal events. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS), pp. 39–42 (2016)

  27. Geohash. https://en.wikipedia.org/wiki/Geohash

  28. Lee, K., et al.: Efficient spatial query processing for big data. In: ACM Sigspatial International Conference on Advances in Geographic Information Systems (2014)

  29. Pal, S., et al.: Embedding an Extra Layer of Data Compression Scheme for Efficient Management of Big-Data, pp. 699–708. Springer, New Delhi (2015)

    Google Scholar 

  30. Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the International Conference on Data Engineering (Series), pp. 497–506. Computer Soc Press, Los Alamitos (1997)

  31. https://en.wikipedia.org/wiki/Trie

  32. HBase. http://hbase.apache.org/book.html

  33. Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the following foundations for support: the National Key Research and Development Program of China (No. 2016YFB0502603), the National Key Research and Development Program of China (No. 2017YFB0503704), the Natural Science Foundation of Hubei Province of China (No. ZRY2015001543) and Fundamental Research Founds for National University, China University of Geosciences (Wuhan) (1610491B20).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Falin Fang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, K., Gu, D., Fang, F. et al. Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency. Cluster Comput 20, 2833–2844 (2017). https://doi.org/10.1007/s10586-017-1081-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1081-3

Keywords

Navigation