ComDA: A common software for nonlinear and Non-Gaussian Land Data Assimilation
Introduction
As a new methodology in Earth system science, land data assimilation incorporates both Earth observations and numeric land surface models to further understand and predict land surface processes. Generally, due to the limited awareness of Earth science concepts and the inherent errors in measuring methods, uncertainties arise that have profound impacts on the accuracy of Earth observations (Crow et al., 2012) and modeling (Merz et al., 2009). Data assimilation takes advantage of Earth observations, modeling and their uncertainties and provides a more effective framework for studying land surface processes (Talagrand, 1997; Liang et al., 2013; Li, 2014). Data assimilation consequently places higher demands on computer development environments for specific applications.
To date, data assimilation has become a widely accepted methodology that has been applied in a variety of research fields, including hydrology (Liu and Gupta, 2007), the carbon cycle (Rayner et al., 2005), climatology (Carton and Giese, 2008; Fang and Li, 2016), and phenology (Ines et al., 2013). Its theoretical basis was strengthened by ongoing frontier exploration, such as multiple scale assimilation (Bocquet et al., 2011), nonlinear and non-Gaussian methods (Apte et al., 2007; Han and Li, 2008; van Leeuwen, 2015), stochastic analysis (Miller, 2007; Liu and Li, 2017), etc. Correspondingly, data assimilation system development has been in full swing. Typical systems include the Global Land Data Assimilation System (GLDAS, Rodell et al., 2004), the European Land Data Assimilation System (ELDAS, Jacobs et al., 2008), the Chinese Land Data Assimilation System (CLDAS, Li et al., 2007), the Canadian Land Data Assimilation System (CaLDAS, Balsamo et al., 2007), and the Earth Observation Land Data Assimilation System (EOLDAS, Lewis et al., 2012).
Certain software endeavors have been involved in the general development of platforms for common data assimilation studies, for example, DART (Data Assimilation Research Testbed, Anderson et al., 2009), OpenDA (Open Data Assimilation library) and OpenMI (Model Interface) (Ridler et al., 2014). Common software for land data assimilation should have the following qualifications: first, parallel computation is necessary for rapid research outputs. An assimilation system tends to be time-consuming, especially for high-dimensional and exceedingly complex land surface models and must contend with the massive introduction of forcing data, state variables or observations. Additionally, a parallel framework favors the performance of ensemble-based algorithms, grid data and parallel-designed models. Second, multiple dynamic models for the wide range of applications and multiple observations for assimilating more Earth observations are necessary. Moreover, additional observations introduce corresponding observation operators, such as radiative transfer models when assimilating remote sensing data. Third, various algorithms are needed to reduce the potential computer algorithm errors and extend the application of software. The ensemble Kalman filter (EnKF) and particle filter (PF) are assimilating algorithms that have widespread use; however, they are computationally demanding because they require a great amount of ensembles to approximate the model's track. Additionally, EnKF is based on the multidimensional Gaussian assumption (Fowler and van Leeuwen, 2013), which may restrict its ability to adapt to non-Gaussian methods. Therefore, integrating advanced KF and PF is necessary. Last, sufficient space is necessary for forthcoming extensions that introduce new dynamic models and measurements without the need for substantial programming, which is also in demand for the land data assimilation community.
These qualifications constitute a benchmark for common land data assimilation software developments. In particular, widely used software, such as DART and land information system (LIS) data assimilation (Kumar et al., 2008), consider a large range of land surface models and remote sensing observations (for example, Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) and Moderate Resolution Imaging Spectroradiometer (MODIS)) but require restrictions on customized models and algorithms for general users. Other related works all have shortages as well. OpenDA & OpenMI provides an interface standard to bridge the assimilation system and numerical models but is not appropriate for user friendly and fast applications because of the lack of realistic models. Karssenberg et al. (2010) propose a visualized software framework, which further contains PF. Other software, including PDAF (parallel data assimilation framework, Nerger and Hiller, 2013), Daspy (Han et al., 2015), TerrSysMP-PDAF (Kurtz et al., 2016) and the soil and water assessment tool-hydrological data assimilation system (SWAT-HDAS) (Zhang et al., 2017), provide limited forecasting models (for example, Community Land Model or SWAT), and assimilation algorithms are restricted to Kalman filters and PF (i.e., EnKF and local ensemble transform Kalman filter). Therefore, general software for land data assimilation and its corresponding applications are still urgently needed.
In this paper, we introduce Common software for Nonlinear and non-Gaussian Land Data Assimilation system (ComDA), which we developed to meet the above demands of a general software platform of data assimilation. This paper is organized as follows. In the next section, the system framework of ComDA is proposed. The integrated models, methods and other techniques are also described in detail. Section 3 and section 4 present two case studies based on ComDA, with one that employs the Chinese Land Data Assimilation System. Method of extending ComDA by introducing a new model is also explained in these sections. High performance computing in ComDA is proposed in section 5. Discussions regarding ComDA are presented in section 6, and conclusions are drawn in section 7.
Section snippets
General design
ComDA (see Fig. 1) is a multisource observation (for example, remote sensing and in situ data) and land surface application–oriented software platform that has the ability to run distributed algorithms and work on multiple operating systems. This software platform includes multimodels (for both forecasting operators and observation operators) and various assimilated algorithms and supports fast development and assimilated data analysis across different fields, such as ecology, hydrology and
Tests with Lorenz Model
A complete assimilation test is conducted to employ simple models, such as Lorenz model and assimilation schemes integrated in ComDA. This test presents a clear routine regarding the operation of an instance in ComDA. In addition, simple synthetic models introduce less uncertainties into the assimilation system, which produces less interference information and provides an ideal concise instance to compare with other data assimilation software.
In the following tests, the employed Lorenz model is
Study on assimilating airborne remote sensing data
Assimilation test using ComDA is conducted in an irrigation district in the midstream region of the Heihe River Basin. SiB2 and EnKF are implemented with multiple source measurements to improve the land surface soil moisture prediction in the study area. The corresponding forcing data including vapor pressure, wind speed, air temperature, precipitation, shortwave/longwave downward radiation were collected by an eddy covariance and large aperture scintillometer system in HiWATER project (Li et
High performance computing in ComDA
Four different HPC solutions are embedded in ComDA (also see Table 1):
- a.
OpenMP, which is a software library of parallel computing that can fully use the ability of the system architecture with a multicore CPU.
- b.
MPI, which is a distributed HPC method that has advantages of parallel computing and information interaction on multiple nodes (based on many computers connected by network).
- c.
OpenMP + MPI, which combines the techniques of OpenMP and MPI to utilize their advantages and achieve a higher
Discussion
The objective of this study is to develop a data assimilation software platform for a wide range of applications in land surface research; therefore, some elemental characteristics should be considered: integration of common used land surface models and observational models (most of them are radiative transfer models for remote sensing data), classical and advanced assimilation algorithms, available interfaces between different modules and vigorous expansibility for further developments and
Summary
This study presents a new general software (ComDA) solution for the land data assimilation community. The advantages of ComDA are the integration of multiple forecasting operators (including CoLM, SiB2, LPJ-DGVM, NOAH LSM, SHAW, VIC-3L, GEOtop, Lorenz and the stochastic Lorenz model) and observation operators (including AIEM, Q/h, MEMLS, PROSAIL, etc.) and implementation of various parallel computing techniques (OpenMP, MPI and CUDA). Furthermore, with the adoption of multiple algorithms
Declarations of competing interest
None.
Acknowledgements
This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences [grant numbers XDA20100104]; the National Natural Science Foundation of China [grant numbers 41730642]; the 13th Five-year Informatization Plan of Chinese Academy of Sciences [grant numbers XXH13505-06]; the National Natural Science Foundation of China [grant numbers 41801270]; and the Foundation for Excellent Youth Scholars of NIEER, CAS.
References (69)
- et al.
Sampling the posterior: an approach to non-Gaussian data assimilation
Physica D
(2007) - et al.
Modeled analysis of the biophysical nature of spectral shifts and comparison with information content of broad bands
Remote Sens. Environ.
(1992) - et al.
Assimilating passive microwave remote sensing data into a land surface model to improve the estimation of snow depth
Remote Sens. Environ.
(2014) - et al.
Comparison of ensemble-based state and parameter estimation methods for soil moisture data assimilation
Adv. Water Resour.
(2015) - et al.
An evaluation of the nonlinear/non-Gaussian filters for the sequential data assimilation
Remote Sens. Environ.
(2008) - et al.
Assimilating multi-source data into land surface model to simultaneously improve estimations of soil moisture, soil temperature, and surface turbulent fluxes in irrigated fields
Agric. For. Meteorol.
(2016) - et al.
Experiments of one-dimensional soil moisture assimilation system based on ensemble Kalman filter
Remote Sens. Environ.
(2008) - et al.
Retrieving soil temperature profile by assimilating MODIS LST products with ensemble Kalman filter
Remote Sens. Environ.
(2008) - et al.
Assimilation of remotely sensed soil moisture and vegetation with a crop simulation model for maize yield prediction
Remote Sens. Environ.
(2013) - et al.
A software framework for construction of process-based stochastic spatio-temporal models and data assimilation
Environ. Model. Software
(2010)
A land surface data assimilation framework using the land information system: description and applications
Adv. Water Resour.
An Earth observation land data assimilation system (EO-LDAS)
Remote Sens. Environ.
Improving the estimation of hydrological states in SWAT model via the ensemble Kalman smoother: synthetic experiments for the Heihe Basin in northwest China
Adv. Water Resour.
Characterization, controlling, and reduction of uncertainties in the modeling and observation of land-surface systems
Sci. China Earth Sci.
Frozen soil parameterization in SiB2 and its validation with GAME-Tibet observations
Cold Reg. Sci. Technol.
A very fast simulated re-annealing (VFSA) approach for land data assimilation
Comput. Geosci. UK
Extension of the microwave emission model of layered snowpacks to coarse-grained snow
Remote Sens. Environ.
Topics in data assimilation: stochastic processes
Physica D
Software for ensemble-based data assimilation systems– implementation strategies and scalability
Comput. Geosci. UK
Data assimilation framework: linking an open data assimilation library (OpenDA) to a widely adopted model interface (OpenMI)
Environ. Model. Software
Estimating near future regional corn yields by integrating multi-source observations into a crop growth model
Eur. J. Agron.
The data assimilation research testbed a community facility
Bull. Am. Meteorol. Soc.
Evolutionary algorithm-based error parametrization methods for data assimilation
Mon. Weather Rev.
A land data assimilation system for soil moisture and temperature: an information content study
J. Hydrometeorol.
Bayesian design of control space for optimal assimilation of observations. Part I: consistent multiscale formalism
Q. J. Roy. Meteorol. Soc.
A reanalysis of ocean climate using Simple Ocean Data Assimilation (SODA)
Mon. Weather Rev.
Emission of rough surfaces calculated by the integral equation method with comparison to three-dimensional moment method simulations
IEEE Trans. Geosci. Rem. Sens.
Upscaling sparse ground-based soil moisture observations for the validation of coarse-resolution satellite soil moisture products
Rev. Geophys.
The common land model
Bull. Am. Meteorol. Soc.
Implementation of noah land surface model advances in the national centers for environmental prediction operational mesoscale Eta model
J. Geophys. Res. Atmos.
Paleoclimate data assimilation: its motivation, progress and prospects
Sci. China Earth Sci.
The Simultaneous Heat and Water (SHAW) Model: Technical Documentation
Observation impact in data assimilation: the effect of non-Gaussian observation error
Tellus A
DasPy 1.0-the open source multivariate land data assimilation framework in combination with the community land model 4.5
Geosci. Model Dev. Discuss. (GMDD)
Cited by (13)
A novel strategy to assimilate category variables in land-use models based on Dirichlet distribution
2022, Environmental Modelling and SoftwareCitation Excerpt :However, data assimilation, also known as model-data fusion, has received limited attention compared to other calibration approaches, although it can explicitly address uncertainty issues that arise in predicting land use change (Levy et al., 2018; Van der kwast et al., 2011, 2012). Data assimilation is an approach to determine the best unbiased estimate of model state variables or parameters by combining different source observations and dynamical models (Li et al., 2007, 2010, 2014, 2021; Liu et al., 2020). Most importantly, data assimilation can quantify the uncertainties of various errors (Levy et al., 2018; Liu et al., 2017; Verstegen et al., 2016; Bai et al., 2013; Liu and Gupta, 2007).
Big data assimilation to improve the predictability of COVID-19
2020, Geography and SustainabilityCitation Excerpt :We integrated data assimilation, parameter estimation, and infectious disease models to predict COVID-19 spread. We retrospectively forecasted the COVID-19 outbreak in Wuhan by using Common software for Data Assimilation (ComDA) (Liu et al., 2020b), which not only integrates the classic algorithms of data assimilation and parameter estimation, i.e., Metropolis-Hasting (MH) (Zhu et al., 2014) and Ensemble Kalman Filter (EnKF), but also employs the contact network model and SEIR model as the model operators. In addition, we successfully reproduced the "Diamond Princess" epidemic by employing Bayesian inference and MH parameter estimation methods combined with an infectious disease model.
Using the contact network model and Metropolis-Hastings sampling to reconstruct the COVID-19 spread on the “Diamond Princess”
2020, Science BulletinCitation Excerpt :Note that the configuration is arbitrary due to the lack of knowledge of ensemble simulation in epidemiology, which highlights the need for analyzing the sensitivity of parameters and initial values in epidemic models. We implement epidemic reconstruction based on a common software for data assimilation development (ComDA [36]). This software is used to fuse the available information into dynamics and produce more reliable predictions or simulations.
Prediction of the COVID-19 spread in African countries and implications for prevention and control: A case study in South Africa, Egypt, Algeria, Nigeria, Senegal and Kenya
2020, Science of the Total EnvironmentCitation Excerpt :Here, the MH sampling method is used to estimate the parameters of the improved SEIR model. Under the condition that the parameters are independent, the algorithm repeatedly learning the daily confirmed cases data, sampling and iterating in the multidimensional space composed of parameters (α, β, γ−1, δ−1, λ, κ), to obtain the optimal estimation of the parameters posterior information by constructing the likelihood function (Li et al., 2020; Liu et al., 2020a; Liu et al., 2020b; Ma et al., 2017; Zhu et al., 2014). The MH parameter optimization algorithm requires that the ranges of pre-specified optimized parameters, after that the sampling iteration is carried out on the multidimensional parameter space.