Skip to main content
Log in

A two-level storage strategy for map-reduce enabled computation of local map algebra

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

In the big data era, high-resolution raster-based geocomputation has been widely employed in geospatial studies. The algorithms used in local map algebra operations are data-intensive and require a large memory space and massive computing power. Simply employing distributed computing framework such as Hadoop to serve such applications incurs storage and performance issues. In this paper, we present a two-level storage strategy specially for map-reduce implementation of local map algebra algorithms under Hadoop. This approach implements efficient storage and manipulation of large raster data sets through three processes: (1) partitioning a raster file into square tile sets, (2) compressing and reorganizing these tile sets to prevent tile overlap across data divisions, and (3) improving MapReduce’s I/O interfaces for data exchange of parallel computation of map algebra. Experiments with real-world datasets show that the proposed strategy can achieve high speedup and efficiency for raster-based spatial analysis applications. The results also show that the strategy has satisfactory scalability as the number of data nodes in clusters or the raster data volume is increased.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Almeer MH (2012) Cloud hadoop map reduce for remote sensing image analysis. Journal of Emerging Trends in Computing and Information Sciences 3(4):637–644

    Google Scholar 

  • Borthakur D (2007) The hadoop distributed file system: Architecture and design. Hadoop Project Website 11 (2007):21

    Google Scholar 

  • Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE international symposium on workload characterization (IISWC). IEEE, pp 44–54

  • Cheng G, Liu L, Jing N, Chen L, Xiong W (2012) General-purpose optimization methods for parallelization of digital terrain analysis based on cellular automata. Computers and Geosciences 45:57–67

    Article  Google Scholar 

  • Giachetta R (2015) A framework for processing large scale geospatial and remote sensing data in mapreduce environment. Comput Graph 49:37–46

    Article  Google Scholar 

  • Guan Q, Clarke KC (2010) A general-purpose parallel raster processing programming library test application using a geographic cellular automata model. Int J Geogr Inf Sci 24(5):695–722

    Article  Google Scholar 

  • Guan Q, Kyriakidis PC, Goodchild MF (2011) A parallel computing approach to fast geostatistical areal interpolation. Int J Geogr Inf Sci 25(8):1241–1267

    Article  Google Scholar 

  • Guan Q, Shi X, Huang M, Lai C (2016) A hybrid parallel cellular automata model for urban growth simulation over gpu/cpu heterogeneous architectures. Int J Geogr Inf Sci 30(3):494–514

    Article  Google Scholar 

  • Hamidouche K, Falcou J, Etiemble D (2011) A framework for an automatic hybrid mpi + openmp code generation. In: Proceedings of the 19th High Performance Computing Symposia, Society for Computer Simulation International, pp 48–55

  • Horn BK (1981) Hill shading and the reflectance map. Proc IEEE 69(1):14–47

    Article  Google Scholar 

  • Li J, Meng L, Wang FZ, Zhang W, Cai Y (2014) A map-reduce-enabled solap cube for large-scale remotely sensed data aggregation. Computers and Geosciences 70:110–119

    Article  Google Scholar 

  • Malakar R, Vydyanathan N (2013) A CUDA-enabled Hadoop cluster for fast distributed image processing. In: 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH). IEEE, pp 1–5

  • Potluri S, Venkatesh A, Bureddy D, Kandalla K, Panda DK (2013) Effcient intra-node communication on intel-mic clusters. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 128–135

  • Qin C-Z, Zhan L-J, Zhu A, et al. (2014a) How to apply the geospatial data abstraction library (gdal) properly to parallel geospatial raster i/o. Trans GIS 18(6):950–957

    Article  Google Scholar 

  • Qin C-Z, Zhan L-J, Zhu A-X, Zhou C-H (2014b) A strategy for raster-based geocomputation under different parallel computing platforms. Int J Geogr Inf Sci 28(11):2127–2144

    Article  Google Scholar 

  • Shi X, Ye F (2013) Kriging interpolation over heterogeneous computer architectures and systems. GIScience and Remote Sensing 50(2):196–211

    Article  Google Scholar 

  • Shi X, Lai C, Huang M, You H (2014) Geocomputation over the emerging heterogeneous computing infrastructure. Trans GIS 18(S1):3–24

    Article  Google Scholar 

  • Tomlin CD (1990) Geographic information systems and cartographic modelling. No. 910.011 T659g. New Jersey, US: Prentice-Hall

  • Watch M (2002) Environmental change and sustainable development in mountains

  • Zhang J, Yang W, Sun J, Lv Y (2010) Gpu-accelerated parallel algorithms for map algebra. In: 2010 International Conference on Environmental Science and Information Application Technology (ESIAT), vol 1. IEEE, pp 882–885

  • Zhang G, Xie C, Shi L, Du Y (2012) A tile-based scalable raster data management system based on hdfs. In: 2012 20th International Conference on Geoinformatics (GEOINFORMATICS),. IEEE, pp 1–4

Download references

Acknowledgments

The authors would like to thank Ms.Dengcheng Xia from China University of Geosciences for his help with the source code. This study was partially supported by the National Natural Science Foundation of China under grant No.41871304, the Fundamental Research Funds for the Central Universities of China University of Geosciences (Wuhan) under grant No.CUG2018JM14, and the National Key Research and Development Program of China under grant No.2017YFB0503804.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianbo Zhang.

Additional information

Communicated by: H. Babaie

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zhou, S., Liang, T. et al. A two-level storage strategy for map-reduce enabled computation of local map algebra. Earth Sci Inform 13, 479–492 (2020). https://doi.org/10.1007/s12145-020-00452-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-020-00452-x

Keywords

Navigation