Abstract
In the big data era, high-resolution raster-based geocomputation has been widely employed in geospatial studies. The algorithms used in local map algebra operations are data-intensive and require a large memory space and massive computing power. Simply employing distributed computing framework such as Hadoop to serve such applications incurs storage and performance issues. In this paper, we present a two-level storage strategy specially for map-reduce implementation of local map algebra algorithms under Hadoop. This approach implements efficient storage and manipulation of large raster data sets through three processes: (1) partitioning a raster file into square tile sets, (2) compressing and reorganizing these tile sets to prevent tile overlap across data divisions, and (3) improving MapReduce’s I/O interfaces for data exchange of parallel computation of map algebra. Experiments with real-world datasets show that the proposed strategy can achieve high speedup and efficiency for raster-based spatial analysis applications. The results also show that the strategy has satisfactory scalability as the number of data nodes in clusters or the raster data volume is increased.
Similar content being viewed by others
References
Almeer MH (2012) Cloud hadoop map reduce for remote sensing image analysis. Journal of Emerging Trends in Computing and Information Sciences 3(4):637–644
Borthakur D (2007) The hadoop distributed file system: Architecture and design. Hadoop Project Website 11 (2007):21
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE international symposium on workload characterization (IISWC). IEEE, pp 44–54
Cheng G, Liu L, Jing N, Chen L, Xiong W (2012) General-purpose optimization methods for parallelization of digital terrain analysis based on cellular automata. Computers and Geosciences 45:57–67
Giachetta R (2015) A framework for processing large scale geospatial and remote sensing data in mapreduce environment. Comput Graph 49:37–46
Guan Q, Clarke KC (2010) A general-purpose parallel raster processing programming library test application using a geographic cellular automata model. Int J Geogr Inf Sci 24(5):695–722
Guan Q, Kyriakidis PC, Goodchild MF (2011) A parallel computing approach to fast geostatistical areal interpolation. Int J Geogr Inf Sci 25(8):1241–1267
Guan Q, Shi X, Huang M, Lai C (2016) A hybrid parallel cellular automata model for urban growth simulation over gpu/cpu heterogeneous architectures. Int J Geogr Inf Sci 30(3):494–514
Hamidouche K, Falcou J, Etiemble D (2011) A framework for an automatic hybrid mpi + openmp code generation. In: Proceedings of the 19th High Performance Computing Symposia, Society for Computer Simulation International, pp 48–55
Horn BK (1981) Hill shading and the reflectance map. Proc IEEE 69(1):14–47
Li J, Meng L, Wang FZ, Zhang W, Cai Y (2014) A map-reduce-enabled solap cube for large-scale remotely sensed data aggregation. Computers and Geosciences 70:110–119
Malakar R, Vydyanathan N (2013) A CUDA-enabled Hadoop cluster for fast distributed image processing. In: 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH). IEEE, pp 1–5
Potluri S, Venkatesh A, Bureddy D, Kandalla K, Panda DK (2013) Effcient intra-node communication on intel-mic clusters. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 128–135
Qin C-Z, Zhan L-J, Zhu A, et al. (2014a) How to apply the geospatial data abstraction library (gdal) properly to parallel geospatial raster i/o. Trans GIS 18(6):950–957
Qin C-Z, Zhan L-J, Zhu A-X, Zhou C-H (2014b) A strategy for raster-based geocomputation under different parallel computing platforms. Int J Geogr Inf Sci 28(11):2127–2144
Shi X, Ye F (2013) Kriging interpolation over heterogeneous computer architectures and systems. GIScience and Remote Sensing 50(2):196–211
Shi X, Lai C, Huang M, You H (2014) Geocomputation over the emerging heterogeneous computing infrastructure. Trans GIS 18(S1):3–24
Tomlin CD (1990) Geographic information systems and cartographic modelling. No. 910.011 T659g. New Jersey, US: Prentice-Hall
Watch M (2002) Environmental change and sustainable development in mountains
Zhang J, Yang W, Sun J, Lv Y (2010) Gpu-accelerated parallel algorithms for map algebra. In: 2010 International Conference on Environmental Science and Information Application Technology (ESIAT), vol 1. IEEE, pp 882–885
Zhang G, Xie C, Shi L, Du Y (2012) A tile-based scalable raster data management system based on hdfs. In: 2012 20th International Conference on Geoinformatics (GEOINFORMATICS),. IEEE, pp 1–4
Acknowledgments
The authors would like to thank Ms.Dengcheng Xia from China University of Geosciences for his help with the source code. This study was partially supported by the National Natural Science Foundation of China under grant No.41871304, the Fundamental Research Funds for the Central Universities of China University of Geosciences (Wuhan) under grant No.CUG2018JM14, and the National Key Research and Development Program of China under grant No.2017YFB0503804.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. Babaie
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, J., Zhou, S., Liang, T. et al. A two-level storage strategy for map-reduce enabled computation of local map algebra. Earth Sci Inform 13, 479–492 (2020). https://doi.org/10.1007/s12145-020-00452-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-020-00452-x