ComDA: A common software for nonlinear and Non-Gaussian Land Data Assimilation

https://doi.org/10.1016/j.envsoft.2020.104638Get rights and content

Highlights

  • ComDA is developed to build a common platform for data assimilation applications

  • Parallel, multidisciplinary and scalability capabilities ensure ComDA works well

  • ComDA promotes studies on the applications of land data assimilation

Abstract

Common software for land data assimilation is urgently needed to implement a wide variety of assimilation applications; however, a fast, easy-to-use, and multidisciplinary application-oriented assimilation platform has not been achieved. Therefore, we developed Common software for Nonlinear and non-Gaussian Land Data Assimilation (ComDA). ComDA integrates multiple algorithms (including diverse Kalman and particle filters), models and observation operators (e.g., common land model (CoLM), Advanced Integral Equation Model (AIEM)), and provides general interfaces for additional operators. Using mixed-language programming and parallel computing technologies (Open Multi-Processing (OpenMP), Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA)), ComDA can assimilate various land surface variables and remote sensing observations. High-performance computing and synthetic tests and real-world tests indicate that ComDA achieves the standard of common land data assimilation software with parallel computation, multiple operators, and assimilation algorithms and is compatible with many models. ComDA can be applied for multidisciplinary data assimilation.

Introduction

As a new methodology in Earth system science, land data assimilation incorporates both Earth observations and numeric land surface models to further understand and predict land surface processes. Generally, due to the limited awareness of Earth science concepts and the inherent errors in measuring methods, uncertainties arise that have profound impacts on the accuracy of Earth observations (Crow et al., 2012) and modeling (Merz et al., 2009). Data assimilation takes advantage of Earth observations, modeling and their uncertainties and provides a more effective framework for studying land surface processes (Talagrand, 1997; Liang et al., 2013; Li, 2014). Data assimilation consequently places higher demands on computer development environments for specific applications.

To date, data assimilation has become a widely accepted methodology that has been applied in a variety of research fields, including hydrology (Liu and Gupta, 2007), the carbon cycle (Rayner et al., 2005), climatology (Carton and Giese, 2008; Fang and Li, 2016), and phenology (Ines et al., 2013). Its theoretical basis was strengthened by ongoing frontier exploration, such as multiple scale assimilation (Bocquet et al., 2011), nonlinear and non-Gaussian methods (Apte et al., 2007; Han and Li, 2008; van Leeuwen, 2015), stochastic analysis (Miller, 2007; Liu and Li, 2017), etc. Correspondingly, data assimilation system development has been in full swing. Typical systems include the Global Land Data Assimilation System (GLDAS, Rodell et al., 2004), the European Land Data Assimilation System (ELDAS, Jacobs et al., 2008), the Chinese Land Data Assimilation System (CLDAS, Li et al., 2007), the Canadian Land Data Assimilation System (CaLDAS, Balsamo et al., 2007), and the Earth Observation Land Data Assimilation System (EOLDAS, Lewis et al., 2012).

Certain software endeavors have been involved in the general development of platforms for common data assimilation studies, for example, DART (Data Assimilation Research Testbed, Anderson et al., 2009), OpenDA (Open Data Assimilation library) and OpenMI (Model Interface) (Ridler et al., 2014). Common software for land data assimilation should have the following qualifications: first, parallel computation is necessary for rapid research outputs. An assimilation system tends to be time-consuming, especially for high-dimensional and exceedingly complex land surface models and must contend with the massive introduction of forcing data, state variables or observations. Additionally, a parallel framework favors the performance of ensemble-based algorithms, grid data and parallel-designed models. Second, multiple dynamic models for the wide range of applications and multiple observations for assimilating more Earth observations are necessary. Moreover, additional observations introduce corresponding observation operators, such as radiative transfer models when assimilating remote sensing data. Third, various algorithms are needed to reduce the potential computer algorithm errors and extend the application of software. The ensemble Kalman filter (EnKF) and particle filter (PF) are assimilating algorithms that have widespread use; however, they are computationally demanding because they require a great amount of ensembles to approximate the model's track. Additionally, EnKF is based on the multidimensional Gaussian assumption (Fowler and van Leeuwen, 2013), which may restrict its ability to adapt to non-Gaussian methods. Therefore, integrating advanced KF and PF is necessary. Last, sufficient space is necessary for forthcoming extensions that introduce new dynamic models and measurements without the need for substantial programming, which is also in demand for the land data assimilation community.

These qualifications constitute a benchmark for common land data assimilation software developments. In particular, widely used software, such as DART and land information system (LIS) data assimilation (Kumar et al., 2008), consider a large range of land surface models and remote sensing observations (for example, Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) and Moderate Resolution Imaging Spectroradiometer (MODIS)) but require restrictions on customized models and algorithms for general users. Other related works all have shortages as well. OpenDA & OpenMI provides an interface standard to bridge the assimilation system and numerical models but is not appropriate for user friendly and fast applications because of the lack of realistic models. Karssenberg et al. (2010) propose a visualized software framework, which further contains PF. Other software, including PDAF (parallel data assimilation framework, Nerger and Hiller, 2013), Daspy (Han et al., 2015), TerrSysMP-PDAF (Kurtz et al., 2016) and the soil and water assessment tool-hydrological data assimilation system (SWAT-HDAS) (Zhang et al., 2017), provide limited forecasting models (for example, Community Land Model or SWAT), and assimilation algorithms are restricted to Kalman filters and PF (i.e., EnKF and local ensemble transform Kalman filter). Therefore, general software for land data assimilation and its corresponding applications are still urgently needed.

In this paper, we introduce Common software for Nonlinear and non-Gaussian Land Data Assimilation system (ComDA), which we developed to meet the above demands of a general software platform of data assimilation. This paper is organized as follows. In the next section, the system framework of ComDA is proposed. The integrated models, methods and other techniques are also described in detail. Section 3 and section 4 present two case studies based on ComDA, with one that employs the Chinese Land Data Assimilation System. Method of extending ComDA by introducing a new model is also explained in these sections. High performance computing in ComDA is proposed in section 5. Discussions regarding ComDA are presented in section 6, and conclusions are drawn in section 7.

Section snippets

General design

ComDA (see Fig. 1) is a multisource observation (for example, remote sensing and in situ data) and land surface application–oriented software platform that has the ability to run distributed algorithms and work on multiple operating systems. This software platform includes multimodels (for both forecasting operators and observation operators) and various assimilated algorithms and supports fast development and assimilated data analysis across different fields, such as ecology, hydrology and

Tests with Lorenz Model

A complete assimilation test is conducted to employ simple models, such as Lorenz model and assimilation schemes integrated in ComDA. This test presents a clear routine regarding the operation of an instance in ComDA. In addition, simple synthetic models introduce less uncertainties into the assimilation system, which produces less interference information and provides an ideal concise instance to compare with other data assimilation software.

In the following tests, the employed Lorenz model is

Study on assimilating airborne remote sensing data

Assimilation test using ComDA is conducted in an irrigation district in the midstream region of the Heihe River Basin. SiB2 and EnKF are implemented with multiple source measurements to improve the land surface soil moisture prediction in the study area. The corresponding forcing data including vapor pressure, wind speed, air temperature, precipitation, shortwave/longwave downward radiation were collected by an eddy covariance and large aperture scintillometer system in HiWATER project (Li et

High performance computing in ComDA

Four different HPC solutions are embedded in ComDA (also see Table 1):

  • a.

    OpenMP, which is a software library of parallel computing that can fully use the ability of the system architecture with a multicore CPU.

  • b.

    MPI, which is a distributed HPC method that has advantages of parallel computing and information interaction on multiple nodes (based on many computers connected by network).

  • c.

    OpenMP + MPI, which combines the techniques of OpenMP and MPI to utilize their advantages and achieve a higher

Discussion

The objective of this study is to develop a data assimilation software platform for a wide range of applications in land surface research; therefore, some elemental characteristics should be considered: integration of common used land surface models and observational models (most of them are radiative transfer models for remote sensing data), classical and advanced assimilation algorithms, available interfaces between different modules and vigorous expansibility for further developments and

Summary

This study presents a new general software (ComDA) solution for the land data assimilation community. The advantages of ComDA are the integration of multiple forecasting operators (including CoLM, SiB2, LPJ-DGVM, NOAH LSM, SHAW, VIC-3L, GEOtop, Lorenz and the stochastic Lorenz model) and observation operators (including AIEM, Q/h, MEMLS, PROSAIL, etc.) and implementation of various parallel computing techniques (OpenMP, MPI and CUDA). Furthermore, with the adoption of multiple algorithms

Declarations of competing interest

None.

Acknowledgements

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences [grant numbers XDA20100104]; the National Natural Science Foundation of China [grant numbers 41730642]; the 13th Five-year Informatization Plan of Chinese Academy of Sciences [grant numbers XXH13505-06]; the National Natural Science Foundation of China [grant numbers 41801270]; and the Foundation for Excellent Youth Scholars of NIEER, CAS.

References (69)

  • S.V. Kumar et al.

    A land surface data assimilation framework using the land information system: description and applications

    Adv. Water Resour.

    (2008)
  • P. Lewis et al.

    An Earth observation land data assimilation system (EO-LDAS)

    Remote Sens. Environ.

    (2012)
  • F.N. Lei et al.

    Improving the estimation of hydrological states in SWAT model via the ensemble Kalman smoother: synthetic experiments for the Heihe Basin in northwest China

    Adv. Water Resour.

    (2014)
  • X. Li

    Characterization, controlling, and reduction of uncertainties in the modeling and observation of land-surface systems

    Sci. China Earth Sci.

    (2014)
  • X. Li et al.

    Frozen soil parameterization in SiB2 and its validation with GAME-Tibet observations

    Cold Reg. Sci. Technol.

    (2003)
  • X. Li et al.

    A very fast simulated re-annealing (VFSA) approach for land data assimilation

    Comput. Geosci. UK

    (2004)
  • C. Matzler et al.

    Extension of the microwave emission model of layered snowpacks to coarse-grained snow

    Remote Sens. Environ.

    (1999)
  • R.N. Miller

    Topics in data assimilation: stochastic processes

    Physica D

    (2007)
  • L. Nerger et al.

    Software for ensemble-based data assimilation systems– implementation strategies and scalability

    Comput. Geosci. UK

    (2013)
  • M.E. Ridler et al.

    Data assimilation framework: linking an open data assimilation library (OpenDA) to a widely adopted model interface (OpenMI)

    Environ. Model. Software

    (2014)
  • J. Wang et al.

    Estimating near future regional corn yields by integrating multi-source observations into a crop growth model

    Eur. J. Agron.

    (2013)
  • J. Anderson et al.

    The data assimilation research testbed a community facility

    Bull. Am. Meteorol. Soc.

    (2009)
  • Y. Bai et al.

    Evolutionary algorithm-based error parametrization methods for data assimilation

    Mon. Weather Rev.

    (2011)
  • G. Balsamo et al.

    A land data assimilation system for soil moisture and temperature: an information content study

    J. Hydrometeorol.

    (2007)
  • M. Bocquet et al.

    Bayesian design of control space for optimal assimilation of observations. Part I: consistent multiscale formalism

    Q. J. Roy. Meteorol. Soc.

    (2011)
  • J.A. Carton et al.

    A reanalysis of ocean climate using Simple Ocean Data Assimilation (SODA)

    Mon. Weather Rev.

    (2008)
  • K.S. Chen et al.

    Emission of rough surfaces calculated by the integral equation method with comparison to three-dimensional moment method simulations

    IEEE Trans. Geosci. Rem. Sens.

    (2003)
  • W.T. Crow et al.

    Upscaling sparse ground-based soil moisture observations for the validation of coarse-resolution satellite soil moisture products

    Rev. Geophys.

    (2012)
  • Y.J. Dai et al.

    The common land model

    Bull. Am. Meteorol. Soc.

    (2003)
  • M.B. Ek et al.

    Implementation of noah land surface model advances in the national centers for environmental prediction operational mesoscale Eta model

    J. Geophys. Res. Atmos.

    (2003)
  • M. Fang et al.

    Paleoclimate data assimilation: its motivation, progress and prospects

    Sci. China Earth Sci.

    (2016)
  • G.N. Flerchinger

    The Simultaneous Heat and Water (SHAW) Model: Technical Documentation

    (2000)
  • A. Fowler et al.

    Observation impact in data assimilation: the effect of non-Gaussian observation error

    Tellus A

    (2013)
  • X.J. Han et al.

    DasPy 1.0-the open source multivariate land data assimilation framework in combination with the community land model 4.5

    Geosci. Model Dev. Discuss. (GMDD)

    (2015)
  • Cited by (13)

    • A novel strategy to assimilate category variables in land-use models based on Dirichlet distribution

      2022, Environmental Modelling and Software
      Citation Excerpt :

      However, data assimilation, also known as model-data fusion, has received limited attention compared to other calibration approaches, although it can explicitly address uncertainty issues that arise in predicting land use change (Levy et al., 2018; Van der kwast et al., 2011, 2012). Data assimilation is an approach to determine the best unbiased estimate of model state variables or parameters by combining different source observations and dynamical models (Li et al., 2007, 2010, 2014, 2021; Liu et al., 2020). Most importantly, data assimilation can quantify the uncertainties of various errors (Levy et al., 2018; Liu et al., 2017; Verstegen et al., 2016; Bai et al., 2013; Liu and Gupta, 2007).

    • Big data assimilation to improve the predictability of COVID-19

      2020, Geography and Sustainability
      Citation Excerpt :

      We integrated data assimilation, parameter estimation, and infectious disease models to predict COVID-19 spread. We retrospectively forecasted the COVID-19 outbreak in Wuhan by using Common software for Data Assimilation (ComDA) (Liu et al., 2020b), which not only integrates the classic algorithms of data assimilation and parameter estimation, i.e., Metropolis-Hasting (MH) (Zhu et al., 2014) and Ensemble Kalman Filter (EnKF), but also employs the contact network model and SEIR model as the model operators. In addition, we successfully reproduced the "Diamond Princess" epidemic by employing Bayesian inference and MH parameter estimation methods combined with an infectious disease model.

    • Using the contact network model and Metropolis-Hastings sampling to reconstruct the COVID-19 spread on the “Diamond Princess”

      2020, Science Bulletin
      Citation Excerpt :

      Note that the configuration is arbitrary due to the lack of knowledge of ensemble simulation in epidemiology, which highlights the need for analyzing the sensitivity of parameters and initial values in epidemic models. We implement epidemic reconstruction based on a common software for data assimilation development (ComDA [36]). This software is used to fuse the available information into dynamics and produce more reliable predictions or simulations.

    • Prediction of the COVID-19 spread in African countries and implications for prevention and control: A case study in South Africa, Egypt, Algeria, Nigeria, Senegal and Kenya

      2020, Science of the Total Environment
      Citation Excerpt :

      Here, the MH sampling method is used to estimate the parameters of the improved SEIR model. Under the condition that the parameters are independent, the algorithm repeatedly learning the daily confirmed cases data, sampling and iterating in the multidimensional space composed of parameters (α, β, γ−1, δ−1, λ, κ), to obtain the optimal estimation of the parameters posterior information by constructing the likelihood function (Li et al., 2020; Liu et al., 2020a; Liu et al., 2020b; Ma et al., 2017; Zhu et al., 2014). The MH parameter optimization algorithm requires that the ranges of pre-specified optimized parameters, after that the sampling iteration is carried out on the multidimensional parameter space.

    View all citing articles on Scopus
    View full text