ABSTRACT
Cloud storage is a kind of external storage which can provide by unlimited storage space with high availability and low cost on maintenance. On the other side, the size of geospatial data is too large and is increasing dramatically which makes such data is hard to be stored in the local data warehouse. Hence following the benefits of Cloud storage, such geospatial data is suitable to be stored in Cloud storage and managed by local data warehouse. However, there is a gap between Cloud storages and data warehouses built on traditional infrastructures, such as the mostly adopted massive parallel processing (MPP) based data warehouse. Therefore, in this paper, we propose a middleware-like architecture to connect MPP data warehouse and Cloud storage. It supports traditional geospatial data retrieving while integrating the Cloud storage lineage by a set of technical designs. Based on the prototype system and practical data, we demonstrate the appreciable performance and the flexibility for other third parties' development. Another major contribution of this paper is that we implement the prototype on open-source data warehouse and we make it open-sourced to public.
- B. Sanou, "The world in 2013: ICT facts and figures," in International Communication Union, United Nations, 2013.Google Scholar
- Open Street Map: http://www.openstreetmap.org.Google Scholar
- J. Pateletal. Building a Scalable Geo-Spatial DBMS: Technology, Implementation, and Evaluation. In: ACM SIGMOD, pp. 336--347, 1997.Google ScholarDigital Library
- Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137--150, 2004.Google ScholarDigital Library
- Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. 2010. MapReduce and parallel DBMSs: friends or foes? Communications of the ACM, 53(1):64--71, January 2010. Google ScholarDigital Library
- Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., and Rasin, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proc. VLDB Endow. 2(1):922--933, 2009. Google ScholarDigital Library
- A. Floratou, U. F. Minhas, and F. Özcan. SQL-on-Hadoop: full circle back to shared-nothing database architectures. In: Proc. VLDB Endow. 7(12): 1295--1306, 2014. Google ScholarDigital Library
- Yueguo Chen, Xiongpai Qin, Haoqiong Bian, et al. A Study of SQL-on-Hadoop Systems. Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014. Google ScholarCross Ref
- B.Zhao, B.I.P.Rubinstein, J.Gemmell, and J.Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550--561, 2012. Google ScholarDigital Library
- Chang, L., Wang, Z., Ma, T., et al, M.: Hawq: a massively parallel processing sql engine in hadoop. In: SIGMOD Conference, pp. 1223--1234 (2014) Google ScholarDigital Library
- Iu, M.-Y., Zwaenepoel, W.: Hadooptosql: a mapreduce query optimizer. In: EuroSys Conference, pp. 251--264 (2010) Google ScholarDigital Library
- PostgreSQL: http://www.postgresql.org.Google Scholar
- S3_fdw: https://github.com/umitanuki/s3_fdw.Google Scholar
- Floratou, A., Teletia, N., DeWitt, D.J., Patel, J.M., Zhang, D.: Can the elephants handle the nosql onslaught? PVLDB 5(12), 1712--1723 (2012). Google ScholarDigital Library
- S. Li, S. Hu, R. Ganti, M. Srivatsa, and T. Abdelzaher. Pyro: A Spatial-Temporal Big-Data Storage System, In: ACM USENIX ATC Conference, 2015.Google ScholarDigital Library
- Kuien Liu, Yandong Yao, Danhuai Guo. On managing geospatial big-data in emergency management: some perspectives, In Proc. of ACM GIS, 2015. Google ScholarDigital Library
- Greenplum: http://greenplum.org/.Google Scholar
- Amazon S3: https://aws.amazon.com/s3/.Google Scholar
- PostGIS: http://postgis.net/.Google Scholar
- Microsoft Azure: https://azure.microsoft.com/en-us/services/.Google Scholar
- Google Cloud Storage: https://cloud.google.com/storage/.Google Scholar
- Prototype source-code: http://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/gps3ext/.Google Scholar
Index Terms
On storing and retrieving geospatial big-data in cloud
Recommendations
Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses
In this paper, the authors present storage structures, PK-map and Tuple-index-map, to improve the performance of query execution and inter-node communication in Cloud Data Warehouses. Cloud Data Warehouses require Read-Optimized databases because large ...
Middleware enabled data sharing on cloud storage services
MW4SOC '10: Proceedings of the 5th International Workshop on Middleware for Service Oriented ComputingWith the emergence of public cloud storage platforms like Amazon, Microsoft and Google etc, individual applications and some enterprise storage are being increasingly deployed on Clouds. However, dynamic data sharing in public clouds face problems of ...
Dynamic Data Deduplication in Cloud Storage
SOSE '14: Proceedings of the 2014 IEEE 8th International Symposium on Service Oriented System EngineeringCloud computing plays a major role in the business domain today as computing resources are delivered as a utility on demand to customers over the Internet. Cloud storage is one of the services provided in cloud computing which has been increasing in ...
Comments