Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

Zheng, Kun; Gu, Danpeng; Fang, Falin; Zhang, Miao; Zheng, Kang; Li, Qi

doi:10.1007/s10586-017-1081-3

Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

Published: 08 August 2017

Volume 20, pages 2833–2844, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Kun Zheng¹,
Danpeng Gu¹,
Falin Fang¹,
Miao Zhang¹,
Kang Zheng¹ &
…
Qi Li¹

1283 Accesses
10 Citations
Explore all metrics

Abstract

Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases

A Framework for OLAP in Column-Store Database: One-Pass Join and Pushing the Materialization to the End

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

References

Lu, F., Zhang, H.: Big data and generalized GIS. Geomat. Inf. Sci. Wuhan Univ. 39(6), 645–654 (2014)
Zhang, X., Song, W., Liu, L.: An implementation approach to store GIS spatial data on NoSQL database. In: Hu, S., Ye, X. (eds.) International Conference on Geoinformatics (2014)
Le, H.V., Takasu, A.: An Efficient Distributed Index for Geospatial Databases, pp. 28–42. Springer, Heidelberg (2015)
Google Scholar
Alvanaki, F., et al.: GIS navigation boosted by column stores. Proc. Vldb Endow. 8(12), 1956–1959 (2015)
Article Google Scholar
Zhang, N., et al. HBaseSpatial: a scalable spatial data storage based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2014)
Nishimura, S., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2SI), 289–319 (2013)
Article Google Scholar
Chen, Z., et al.: Hybrid Range Consistent Hash Partitioning Strategy—A New Data Partition Strategy for NoSQL Database, pp. 1161–1169. IEEE, New York (2013)
Google Scholar
Qi, W., Song, J., Bao, Y.B.: Near-uniform range partition approach for increased partitioning in large database. In: IEEE International Conference on Information Management and Service (IMS) (2010)
Kumar, A., Yadav, J.S.: A review on partitioning techniques. Database 35(3), 342–347342 (2014)
Google Scholar
George, L.: HBase schema design—things you need to know—O’Reilly Media Free. Live Events (2017)
Chang, F., et al.: Bigtable: a distributed storage system for structured data, pp. 205–218. USENIX Association, Berkeley (2006)
Google Scholar
IBM. https://en.wikipedia.org/wiki/Dynamo_(storage_system)
Cassandra. https://cassandra.apache.org/doc/latest/
Akdogan, A., et al.: Cost-efficient partitioning of spatial data on cloud. In: International Conference on Big Data (2015)
Xia, C., Wang, T.: Cached Index of HBase based on coprocessor. In: International Conference on Computer Science and Communication Engineering (CSCE 2015), pp. 123–129 (2015)
Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of IEEE International Conference on Computer Science & Software Engineering (2015)
Zhuang, H., et al.: Design of a more scalable database system. In: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, pp. 1213–1216. IEEE, New York (2015)
Zhong, Y., Liu, D.: The application of K-means clustering algorithm based on Hadoop. In: Proceedings of 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016), pp. 88–92 (2016)
George, L.: HBase The Definitive Guide. O’Reilly Media, Newton (2011)
Google Scholar
Cruz, F., et al.: Workload-Aware Table Splitting for NoSQL, pp. 399–404. Aurora Construction Materials, Rockbank (2014)
Google Scholar
Ye, Z., Li, S.: A request skew aware heterogeneous distributed storage system based on Cassandra. In: International Conference on Computer and Management (2011)
Elghamrawy, S.M.: An adaptive load-balanced partitioning module in Cassandra using rendezvous hashing. In: International Conference on Advanced Intelligent Systems and Information (2016)
Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Vldb Endow. 8(12), 1602–1605 (2015)
Article Google Scholar
Han, D., Stroulia, E.: HGrid: a data model for large geospatial data sets in HBase. In: IEEE Sixth International Conference on Cloud Computing (2013)
Fox, A., et al.: Spatio-temporal Indexing in Non-relational Distributed Databases. IEEE, New York (2013)
Book Google Scholar
Hughes, J.N., et al.: A survey of techniques and open-source tools for processing streams of spatio-temporal events. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS), pp. 39–42 (2016)
Geohash. https://en.wikipedia.org/wiki/Geohash
Lee, K., et al.: Efficient spatial query processing for big data. In: ACM Sigspatial International Conference on Advances in Geographic Information Systems (2014)
Pal, S., et al.: Embedding an Extra Layer of Data Compression Scheme for Efficient Management of Big-Data, pp. 699–708. Springer, New Delhi (2015)
Google Scholar
Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the International Conference on Data Engineering (Series), pp. 497–506. Computer Soc Press, Los Alamitos (1997)
https://en.wikipedia.org/wiki/Trie
HBase. http://hbase.apache.org/book.html
Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the following foundations for support: the National Key Research and Development Program of China (No. 2016YFB0502603), the National Key Research and Development Program of China (No. 2017YFB0503704), the Natural Science Foundation of Hubei Province of China (No. ZRY2015001543) and Fundamental Research Founds for National University, China University of Geosciences (Wuhan) (1610491B20).

Author information

Authors and Affiliations

Faculty of Information Engineering, China University of Geoscience (WuHan), Wuhan, 430074, China
Kun Zheng, Danpeng Gu, Falin Fang, Miao Zhang, Kang Zheng & Qi Li

Authors

Kun Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Danpeng Gu
View author publications
You can also search for this author in PubMed Google Scholar
Falin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Miao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Qi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Falin Fang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, K., Gu, D., Fang, F. et al. Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency. Cluster Comput 20, 2833–2844 (2017). https://doi.org/10.1007/s10586-017-1081-3

Download citation

Received: 14 March 2017
Revised: 25 July 2017
Accepted: 27 July 2017
Published: 08 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10586-017-1081-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

Abstract

Access this article

Similar content being viewed by others

A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases

A Framework for OLAP in Column-Store Database: One-Pass Join and Pushing the Materialization to the End

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

Abstract

Access this article

Similar content being viewed by others

A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases

A Framework for OLAP in Column-Store Database: One-Pass Join and Pushing the Materialization to the End

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation