Elsevier

Computers & Geosciences

Volume 37, Issue 2, February 2011, Pages 165-176
Computers & Geosciences

Optimizing grid computing configuration and scheduling for geospatial analysis: An example with interpolating DEM

https://doi.org/10.1016/j.cageo.2010.05.015Get rights and content

Abstract

Many geographic analyses are very time-consuming and do not scale well when large datasets are involved. For example, the interpolation of DEMs (digital evaluation model) for large geographic areas could become a problem in practical application, especially for web applications such as terrain visualization, where a fast response is required and computational demands exceed the capacity of a traditional single processing unit conducting serial processing. Therefore, high performance and parallel computing approaches, such as grid computing, were investigated to speed up the geographic analysis algorithms, such as DEM interpolation. The key for grid computing is to configure an optimized grid computing platform for the geospatial analysis and optimally schedule the geospatial tasks within a grid platform. However, there is no research focused on this. Using DEM interoperation as an example, we report our systematic research on configuring and scheduling a high performance grid computing platform to improve the performance of geographic analyses through a systematic study on how the number of cores, processors, grid nodes, different network connections and concurrent request impact the speedup of geospatial analyses. Condor, a grid middleware, is used to schedule the DEM interpolation tasks for different grid configurations. A Kansas raster-based DEM is used for a case study and an inverse distance weighting (IDW) algorithm is used in interpolation experiments.

Introduction

Many geographic problems pose significant computational challenges in response to multiple emerging needs (Yang and Raskin, 2009):

  • Large volumes of distributed data. For example, satellites collect terabytes to petabytes of geospatial data from space on a daily basis. In situ sensors and social activities are also accumulating data at a comparable pace.

  • Complex spatial analysis methods. These computationally intensive methods extend across a broad spectrum of spatial and temporal scales, and are now gaining widespread acceptance.

  • Rapid response times. Concurrent user accesses require web-based applications with fast access and rapid responses times.

The interaction between the above factors further contributes to the challenge of processing geospatial data.

Fortunately, research in recent years has shown that grid computing can effectively address these computing demands in a distributed fashion (Armstrong et al., 2005, Yang et al., 2005). Grid computing is the integration of high-speed internet, high-performance computers, large-scale databases, sensors, remote device, etc., to provide computing resource support for data or computing intensive applications (Foster and Karonis, 1998). With the continual decline in price for computer hardware and networks, it becomes practical for most laboratories with limited funding to deploy a grid computing platform. There is no reported research on how to utilize the publicly available computing resources to configure a grid platform with the best performance. Geospatial-analyses-based applications have special requirements that cannot be matched by generic grid computing platforms because most geospatial analyses algorithms are not designed to leverage multiple CPUs and grid computing middleware has generally not been developed for geospatial applications. Therefore, it is an urgent need to investigate how geospatial analyses can leverage grid computing to improve performance, such as to address regional to global level high resolution data processing requirements (Liu et al., 2006). One of the most important issues is how to organize and configure a good grid computing platform for geospatial analyses and applications, and how to schedule the computing resources in the computing pool (Rahman et al., 2010). This paper utilizes a digital elevation model (DEM) interoperation as an example to investigate how to configure and schedule a better grid computing platform to improve the performance of geographic analyses, offers insights into a computing solution for geospatial analyses and provides guidance for developing middleware for better scheduling the jobs.

This paper addresses these challenges and designs a set of experiments to study the impact of different configurations on grid computing. The research results can also be adapted by GIS experts to help improve overall grid platform performance for other geospatial applications, such as model simulation by configuring an optimized computing pool. It also intends to provide insights for IT experts to develop middleware to improve the grid computing capability for better supporting scientific problems. We aim to answer the following questions for geospatial applications:

  • What is the comparable performance of homogeneous and heterogeneous grid nodes?

  • What are the best network configurations for a grid computing pool?

  • How to select the grid nodes: multiprocessors (two or more processors on multiple chips), multi-core processors (two or more cores on the same chip), multiprocessors (single-core processors on multiple chips) or a single processor?

  • What are the potential bottlenecks in multi-core technology and how to possibly avoid/solve them?

  • What is the impact of concurrent requests to the performance of grid computing pool?

The experiments utilize different grid computing configurations to support the process of DEM interpolation. During this process, a DEM domain is decomposed into subdomains in a way that balances the workload across the grid computing platform. A uniform grid decomposition of the domain is used to partition an entire DEM into several subdomains with each having the same size. A DEM of the state of Kansas with an almost rectangular boundary, which is appropriate for grid decomposition, is selected as study data. Since the purpose of this paper is not to evaluate the accuracy of interpolation results but to demonstrate the impacts of different grid computing environment configurations to the DEM interpolation process and different interpolation methods will not change the results; the popular and easy-to-use IDW interpolation algorithm is applied to interpolate the DEM datasets. The Condor middleware is utilized to schedule the DEM interpolation tasks.

Section 2 introduces grid computing architecture, the benefits and challenges of high performance technology, grid computing platforms, and methodologies used for fast interpolation of high resolution DEMs. Section 3 introduces the DEM data and data processes used in this paper. Section 4 reports and discusses our research and results in the comparative performance of grid computing environments with different number of CPU cores and CPUs, different network connections of computers, and concurrent requests. Section 5 concludes and provides some solution suggestions for solving time-consuming geographic analyses using grid computing, and discusses future research directions and aspects.

Section snippets

Grid computing architecture

Grid computing supports distributed user requests and achieves optimal resource scheduling implementing distributed collaborative mechanisms. A grid computing platform includes three layers: (a) resource layer, which includes a variety of data resources, computing resources, devices, and other resources connected through computer networks. (b) To achieve resource sharing in a distributed network environment, intelligent management mechanisms are required for resource discovery and dynamic

Data and data processing

Kansas was selected as a study region due to its near-rectangular boundary that makes it well suited to grid decomposition. The Kansas DEM was downloaded from the National Map Seamless Server (NMSS3). NMSS provides 1/9 arc-sec high resolution data, 1/3 arc-sec USGS DEMs, 1 arc-sec USGS DEMs, 2 arc-sec USGS DEMs, and 3 arc-sec USGS DEMs in the ArcGrid format, while the 2 arc-sec DEMs are used only in Alaska and the 3 arc-sec DEMs are used only to fill in values over some large

Grid computing platform

The Joint Center of Intelligent Spatial Computing (CISC) at George Mason University (GMU) hosts a grid-based computing pool as illustrated in Fig. 4. In this computing pool, Condor is used as a middleware to dispatch and execute the jobs.

The efficiency of the grid platform was tested in different configurations from homogeneous grid, heterogeneous grid nodes, multi-core architecture, to the computer networks. For the homogeneous grid vs. heterogeneous grid and communication latency experiments,

Conclusion and discussion

This paper reports our research on how the configuration and scheduling of grid computing will impact the efficiency of a grid platform for spatial analysis using the DEM interpolation as an example. The study uses the publicly available 7.5 minute USGS DEM data of Kansas, and the inverse distance weighting (IDW) algorithm for DEM interpolation. Different domain decompositions of the Kansas DEM are used to test the grid computing performance. Five sets of experiments are conducted to investigate

Acknowledgements

The research reported is supported by FGDC CAP program grants (08HQPA0002 and G09AC00103)m and a 2007 NASA grant (NNX07AD99G).

References (31)

  • Chai, L., Gao, Q., Dhabaleswar, K., 2007. Understanding the impact of multi-core architecture in cluster computing: a...
  • B.E. Cramer et al.

    An evaluation of domain decomposition strategies for parallel spatial interpolation of surfaces

    Geographical Analysis

    (1999)
  • Denning, P.J., 1968. Thrashing: its causes and prevention. In: Proceedings of the Fall Joint Computer Conference, Part...
  • Dong, Y., Fu, B., Yoshiki, N., 2008. DEM generation methods and applications in revealing of topographic changes caused...
  • L. Eklundh et al.

    Rapid generation of digital elevation models from topographic maps

    International Journal of Geographical Information Science

    (1995)
  • Cited by (29)

    • Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

      2017, Computers and Geosciences
      Citation Excerpt :

      First studies related to parallelization of Geographic Information System (GIS) operations were done by Healey et al. (1998) and Mineter and Dowers (1999). Several parallelization studies were published in the area of digital terrain modeling and analysis (for example, Huang and Yang, 2011; Huang et al., 2011; Schiele et al., 2012; Xie, 2012) and hydrological modeling (Cui et al., 2005; Sten et al., 2016). Still surprisingly, most of the current GIS software products exploit these advances in a very limited way and nearly all operations are executed by a single process.

    • High-throughput computing provides substantial time savings for landscape and conservation planning

      2014, Landscape and Urban Planning
      Citation Excerpt :

      This approach will tile the input features based on estimated computational requirements of underlying spatial structure while facilitating dissemination of these tiles to individual processors (Wang & Armstrong, 2009). In our study, strict optimization of hardware and software (see Huang & Yang, 2011) was not desired or necessary due to unrestricted access to the cyberinfrastructure, number of available workstations, and our focus on usability to the end user. Although HTC fills an immediate need of planners to improve their models and planning efforts at multiple spatial scales, it is not the panacea.

    • Large-scale, high-resolution agricultural systems modeling using a hybrid approach combining grid computing and parallel processing

      2013, Environmental Modelling and Software
      Citation Excerpt :

      Grid computing can offer a viable alternative to clusters for high-throughput computing without the need to translate Windows-based models to another operating system. Large organizations can have many Windows-based desktop computers connected through high-speed networks which, with significant idle time, commonly operate at only a fraction of their processing potential (Huang and Yang, 2011). A key advantage of grid computing is that it can effectively coordinate loosely coupled, heterogeneous, and geographically dispersed computing resources over multiple administrative domains to achieve a common computing goal (Jeffery, 2007; Schwiegelshohn et al., 2010).

    • Implementation and performance optimization of a parallel contour line generation algorithm

      2012, Computers and Geosciences
      Citation Excerpt :

      High throughput computing was used to solve the computing intensive problem of DTA (Mineter et al., 2003; Gong and Xie, 2009). Optimizing grid computing configuration and scheduling was tested to enhance the DTA performance in a Grid computing environment (Huang and Yang, 2011). In recent years, Graphic Processing Unit (GPU) was used to accelerate DTA algorithms such as the view-shed analysis (Fang et al., 2011).

    View all citing articles on Scopus
    View full text