Abstract
Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.
Similar content being viewed by others
References
Lu, F., Zhang, H.: Big data and generalized GIS. Geomat. Inf. Sci. Wuhan Univ. 39(6), 645–654 (2014)
Zhang, X., Song, W., Liu, L.: An implementation approach to store GIS spatial data on NoSQL database. In: Hu, S., Ye, X. (eds.) International Conference on Geoinformatics (2014)
Le, H.V., Takasu, A.: An Efficient Distributed Index for Geospatial Databases, pp. 28–42. Springer, Heidelberg (2015)
Alvanaki, F., et al.: GIS navigation boosted by column stores. Proc. Vldb Endow. 8(12), 1956–1959 (2015)
Zhang, N., et al. HBaseSpatial: a scalable spatial data storage based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2014)
Nishimura, S., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2SI), 289–319 (2013)
Chen, Z., et al.: Hybrid Range Consistent Hash Partitioning Strategy—A New Data Partition Strategy for NoSQL Database, pp. 1161–1169. IEEE, New York (2013)
Qi, W., Song, J., Bao, Y.B.: Near-uniform range partition approach for increased partitioning in large database. In: IEEE International Conference on Information Management and Service (IMS) (2010)
Kumar, A., Yadav, J.S.: A review on partitioning techniques. Database 35(3), 342–347342 (2014)
George, L.: HBase schema design—things you need to know—O’Reilly Media Free. Live Events (2017)
Chang, F., et al.: Bigtable: a distributed storage system for structured data, pp. 205–218. USENIX Association, Berkeley (2006)
Cassandra. https://cassandra.apache.org/doc/latest/
Akdogan, A., et al.: Cost-efficient partitioning of spatial data on cloud. In: International Conference on Big Data (2015)
Xia, C., Wang, T.: Cached Index of HBase based on coprocessor. In: International Conference on Computer Science and Communication Engineering (CSCE 2015), pp. 123–129 (2015)
Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of IEEE International Conference on Computer Science & Software Engineering (2015)
Zhuang, H., et al.: Design of a more scalable database system. In: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, pp. 1213–1216. IEEE, New York (2015)
Zhong, Y., Liu, D.: The application of K-means clustering algorithm based on Hadoop. In: Proceedings of 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016), pp. 88–92 (2016)
George, L.: HBase The Definitive Guide. O’Reilly Media, Newton (2011)
Cruz, F., et al.: Workload-Aware Table Splitting for NoSQL, pp. 399–404. Aurora Construction Materials, Rockbank (2014)
Ye, Z., Li, S.: A request skew aware heterogeneous distributed storage system based on Cassandra. In: International Conference on Computer and Management (2011)
Elghamrawy, S.M.: An adaptive load-balanced partitioning module in Cassandra using rendezvous hashing. In: International Conference on Advanced Intelligent Systems and Information (2016)
Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Vldb Endow. 8(12), 1602–1605 (2015)
Han, D., Stroulia, E.: HGrid: a data model for large geospatial data sets in HBase. In: IEEE Sixth International Conference on Cloud Computing (2013)
Fox, A., et al.: Spatio-temporal Indexing in Non-relational Distributed Databases. IEEE, New York (2013)
Hughes, J.N., et al.: A survey of techniques and open-source tools for processing streams of spatio-temporal events. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS), pp. 39–42 (2016)
Lee, K., et al.: Efficient spatial query processing for big data. In: ACM Sigspatial International Conference on Advances in Geographic Information Systems (2014)
Pal, S., et al.: Embedding an Extra Layer of Data Compression Scheme for Efficient Management of Big-Data, pp. 699–708. Springer, New Delhi (2015)
Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the International Conference on Data Engineering (Series), pp. 497–506. Computer Soc Press, Los Alamitos (1997)
Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)
Acknowledgements
The authors would like to thank the following foundations for support: the National Key Research and Development Program of China (No. 2016YFB0502603), the National Key Research and Development Program of China (No. 2017YFB0503704), the Natural Science Foundation of Hubei Province of China (No. ZRY2015001543) and Fundamental Research Founds for National University, China University of Geosciences (Wuhan) (1610491B20).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, K., Gu, D., Fang, F. et al. Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency. Cluster Comput 20, 2833–2844 (2017). https://doi.org/10.1007/s10586-017-1081-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1081-3