ABSTRACT
We are in the era of Spatial Big Data. Due to the developments of topographic techniques, clear satellite imagery, and various means for collecting information, geospatial datasets are growing in volume, complexity and heterogeneity. For example, OpenStreetMap data for the whole world is about 1 TB and NASA world climate datasets are about 17 TB. Spatial data volume and variety makes spatial computations both data-intensive and compute-intensive. Due to the irregular distribution of spatial data, domain decomposition becomes challenging. In this work, we present spatial data partitioning technique that takes into account spatial join cost. In addition, we present spatial join computation using Asynchronous Dynamic Load Balancing (ADLB) library. ADLB is a software library designed to help rapidly build scalable parallel programs using MPI. We evaluated the performance of ADLB-based MPI-GIS implementation. In our existing work, spatial data movement cost from ADLB server to worker MPI processes limited the scalability of MPI-GIS.
- {n. d.}. SpatialHadoop, http://spatialhadoop.cs.umn.edu. Website. ({n. d.}). http://spatialhadoop.cs.umn.edu/Google Scholar
- Dinesh Agarwal, Satish Puri, Xi He, and Sushil K Prasad. 2012. A system for GIS polygonal overlay computation on linux cluster-an experience and performance report. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 1433--1439. Google ScholarDigital Library
- Ewing L Lusk, Steve C Pieper, Ralph M Butler, et al. 2010. More scalability, less pain: A simple programming model and its implementation for extreme computing. SciDAC Review 17, 1 (2010), 30--37.Google Scholar
- Satish Puri. 2019. SpatialMPI: Message Passing Interface for GIS Applications. Geographic Information Science & Technology Body of Knowledge 2019, Q2 (2019).Google Scholar
- Satish Puri, Anmol Paudel, and Sushil K Prasad. 2018. MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data. In Proceedings of the 47th International Conference on Parallel Processing, ICPP. 13. Google ScholarDigital Library
- Satish Puri and Sushil K Prasad. 2015. A parallel algorithm for clipping polygons with improved bounds and a distributed overlay processing system using mpi. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 576--585. Google ScholarDigital Library
Index Terms
- Spatial Data Decomposition and Load Balancing on HPC Platforms
Recommendations
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingIn recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, ...
Data Partitioning for Parallel Spatial Join Processing
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can ...
Comments