Elsevier

Environmental Modelling & Software

Volume 39, January 2013, Pages 247-262
Environmental Modelling & Software

Toward self-describing and workflow integrated Earth system models: A coupled atmosphere-ocean modeling system application

https://doi.org/10.1016/j.envsoft.2012.02.013Get rights and content

Abstract

The complexity of Earth system models and their applications is increasing as a consequence of scientific advances, user demand, and the ongoing development of computing platforms, storage systems and distributed high-resolution observation networks. Multi-component Earth system models need to be redesigned to make interactions among model components and other applications external to the modeling system easier. To that end, the common component interfaces of Earth system models can be redesigned to increase interoperability between models and other applications such as various web services, data portals and science gateways. The models can be made self-describing so that the many configuration, build options and inputs of a simulation can be recorded. In this paper, we present a coupled modeling system that includes the proposed methodology to create self-describing models with common model component interfaces. The designed coupled atmosphere-ocean modeling system is also integrated into a scientific workflow system to simplify routine modeling tasks and relationships between these tasks and to demonstrate the enhanced interoperability between different technologies and components. Later on, the work environment is tested using a realistic Earth system modeling application. As can be seen through this example, a layered design for collecting provenance and metadata has the added benefit of documenting a run in far greater detail than before. In this way, it facilitates exploration and understanding of simulations and leads to possible reproducibility. In addition to designing self-describing Earth system models, the regular modeling tasks are also simplified and automated by using a scientific workflow which provides meaningful abstractions for the model, computing environment and provenance/metadata collection mechanisms. Our aim here is to solve a specific instance of a complex model integration problem by using a framework and scientific workflow approach together. The reader may also note that the methods presented in this paper might be also generalized to other types of Earth system models, leading to improved ease of use and flexibility. The initial results also show that the coupled atmosphere-ocean model, which is controlled by the designed workflow environment, is able to reproduce the Mediterranean Sea surface temperature when it is compared with the used CCSM3 initial and boundary conditions.

Introduction

With the continual advances in high-performance computing systems and observation networks over the last few decades, scientific tasks related to modeling the Earth system have become increasingly complex and computationally demanding. Examples of the complexities in modeling systems include components with non-standard programming interfaces, preparation or processing input data with various file formats and grid types, running a modeling system on a variety of computational resources (computational grids, cluster or shared memory systems), keeping track of the model parameter changes, and analyzing results with ad hoc metadata structures. To reduce these complexities and promote understanding, programming interfaces among different Earth system models must be standardized and self-describing modeling systems must be designed to achieve increased interoperability among models and external applications and technologies such as scientific workflow systems, metadata/data portals, web services, scientific gateways, etc.

In this context, ‘self-describing’ means that metadata and provenance information about the model components, inputs, and parameters used in a simulation are produced along with data output. A similar concept is widely used in data formats such as NetCDF1 and HDF2 that are specialized for Earth system science and remote sensing. The convention for ‘self-describing’ models basically provides a definitive description of model and its parameters, metadata about transferred data among different model components, input and output fields, build system and runtime environment. Using common conventions for designing ‘self-describing’ models promotes easy to use and efficient Earth system models for users and developers. Because the implementation of this concept in Earth system modeling is a relatively new approach, the development of tools and methodologies that can enable the design of self-describing Earth system models is still an open research area. Fortunately, many efforts are currently underway to create easy to use multi-component Earth system models with common component interfaces and their applications. Basically, these efforts can be categorized in two main groups: modeling frameworks and scientific workflow systems.

A modeling framework is an environment for coupling model components and couplers of different kinds of Earth subsystem models through a common calling interface. The main advantage of the modeling framework approach is that it reduces the complexity of the regular tasks (i.e. interpolation of different grids, transferring data among model components) to design coupled modeling systems and helps to increase the efficiency and interoperability of the different model components. It also simplifies the synchronization of the execution of individual model components and the exchange of data/metadata among them. The Earth System Modeling Framework (ESMF) is one of the most popular examples for this approach. The ESMF consists of a superstructure for coupling components of Earth system applications in a standardized way and an infrastructure of robust, high-performance utilities and data structures that ensure consistent component behavior (Hill et al., 2004, Hill et al., 2006; Collins et al., 2005). In this context, standardization means that the multi-agency consortium responsible for developing ESMF agreed to conform to a set of specific interfaces in order to improve the interoperability of their models. Unlike many other examples of the framework approach such as Model Coupling Toolkit (MCT; Jacob et al., 2005; Larson et al., 2005), Model Coupling Environmental Library (MCEL; Bettencourt, 2002) and OASIS (Redler et al., 2010), ESMF also provides the ability to store and export component (physical model and coupler), grid and field-level (physical variable such as temperature) metadata as XML and other documents. The latest public release of ESMF (5.2.0r) also allows use of different metadata conventions such as Climate and Forecast3 (CF), ISO standards and METAFOR4 Common Information Model (CIM) to store and write the gathered metadata. The main goal of the European Union (EU) founded project METAFOR is to describe climate simulations. The 5th Coupled Model Intercomparison Project5 (CMIP5) is the latest example to show the importance of the usage of such conventions (i.e. CIM) in Earth system science applications. It is clear that this feature can be used to create preliminary examples of self-describing Earth system models that conform to popular standards and conventions, which are also used in portals. In this way, it also achieves interoperability between designed self-describing Earth system model and data/metadata portals (i.e. Earth System Grid6 – ESG).

In recent decades, the popularity of community-based modeling systems has grown so that multiple groups can address difficult modeling problems together. A community-based modeling system is defined as an open-source suite of modeling components coupled in a framework (Voinov et al., 2010). The technologies used to support these systems are improving, but many still face challenges with respect to interoperability and metadata. For example, the Community Modeling and Analysis System7 (CMAS) include many different components specific to air-quality modeling applications such as atmosphere and air-quality models, emission-processing tools etc. To couple the Community Multiscale Air Quality (CMAQ) model with the MM5 (Grell et al., 1995) atmosphere model, CMAS uses the MM5 Meteorology Coupler. The main disadvantage of this approach is that the coupler is designed to work with a specific atmosphere model and the user needs to develop new couplers to work with other models such as WRF (Michalakes et al., 2001; Janjic et al., 2001). It is clear that using generic coupler libraries that provide common component interfaces can help to solve this interoperability problem. The Community Surface Dynamics Modeling System8 (CSDMS) is another example of a community modeling system. The CSDMS system is based on the Common Component Architecture (CCA; Armstrong et al., 1999). The CCA can create generic calling interfaces for components based on wrappers, and can work with a variety of different tools underneath, including ESMF and MCT. The CSDMS also includes a graphical user interface to link different modeling system components. This system displays many advantages of the framework approach. However, metadata must still be captured manually and is not integrated into the modeling system.

In contrast to a modeling framework approach, scientific workflow systems create generic interfaces to a variety of technologies such as job schedulers, authentication mechanisms, data transfer protocols etc. and automate the execution and monitoring of a heterogeneous workflow (Altintas et al., 2006). A scientific workflow system is defined as a problem-solving environment that simplifies tasks by creating meaningful, easily understandable sub-tasks/modules and combining them to form executable data management and analysis pipelines (Bowers and Ludäscher, 2005; Ludäscher et al., 2006). It acts as an abstraction layer to keep the details of the computing environment, modeling system and external applications (i.e. pre-/post-processing and visualization application) away from the user. Through their use, interoperability of different technologies can be achieved to create an easy to use work environment for Earth system modelers.

There are several scientific workflow applications in existence such as Kepler (Ludäscher et al., 2006), Taverna (Oinn et al., 2004), Triana (Majithia et al., 2004), Trident (Barga et al., 2008) and VisTrails (Bavoil et al., 2005). In addition to standalone scientific workflow systems, the Linked Environments for Atmospheric Discovery (LEAD; Plale et al., 2006) project demonstrates how workflows can be used to solve problems specific to Earth system science by integrating together various technologies such as web and grid services, metadata repositories, and workflow systems. The LEAD portal integrates a sophisticated set of tools to enable users to access, analyze, run, and visualize meteorological data and forecast models, facilitating an interactive study of weather. In the LEAD portal, the XBaya9 workflow composer is used to generate BPEL (Business Process Execution Language) workflows and compose web services. After the success of the NSF funded LEAD project, LEAD II is designed to be a follow-on project. As such, the LEAD project is an important example that illustrates how scientific workflows can be used to simplify interfaces to Earth science applications and how these different technologies can be used to address a specific problem that includes many different tasks. While workflows have shown great value, a common disadvantage of the approach is that the steps for creating connections between models and the scientific workflow application are not straightforward and easy if the modeling applications used do not have common interfaces. As described in the beginning of this section, the framework approach might help to design Earth system models that have common interfaces that can be used to create standardized connections among models and scientific workflow applications.

The scientific workflow system also creates a work environment in which one can collect and archive provenance information about specific model runs. Provenance is defined as structured information that keeps track of the origin and derivation of the modeling applications (Bowers et al., 2006; Klasky et al., 2008). The collected information can be used to compare, reproduce, tune, debug or validate a specific set of simulation runs. The combination of provenance information and metadata collected from the modeling system itself can be also used to create self-describing Earth system models. The stored provenance and metadata information can be parsed and queried to reproduce the work environment (operating system, compilers, model and its configurations) that produced a particular result.

Provenance information has been categorized as system, workflow, data and process provenance (Bowers et al., 2006; Klasky et al., 2008). Briefly, system provenance is defined as data about the computing environment or host system in which the job executes. It includes information about the operating system, environment variables and libraries that are used and/or defined by the application, and system software and model versions. With this information, the user can reproduce the work environment later on. In addition to system provenance information, workflow provenance is the data about the evolution and structure of the workflow itself, i.e. different versions, designs, and internal structures of the workflow. The collected workflow provenance information can be stored as an incremental change or complete new version but the main structure of the data and the storage type (i.e. relational database, ASCII, XML) can vary based on the provenance collection tool and scientific workflow application itself. As mentioned above, Kepler and VisTrails workflow applications are able to capture the workflow provenance at different levels of detail but the proposed coupled modeling system needs additional tools to collect system and process provenance information from the model itself. In this study, we are only interested in recording system and workflow provenance information related to model simulation, not data and process provenance in the conventional sense.

The purpose of this paper is to demonstrate the advantages of using a modeling framework and scientific workflow approach together to create a self-describing Earth system modeling workflow, which provides capabilities that existing modeling workflow systems do not. The proposed modeling system and work environment aim at increasing the interoperability of different technologies. The automatic collection of metadata and provenance information also simplifies the interaction between the user and the modeling system. The provenance information enables users to track back from the results to debug and reproduce the simulation. Our contribution is developing a methodology for integrating all the necessary pieces to create this new type of workflow. To test the capabilities of this modeling environment, we use a particular instance of a complex model integration problem that includes three main components: an Earth system model, a scientific workflow application and a coupler library to create a coupled modeling system and collect metadata information from the models that were used.

The rest of this paper is organized as follows: in the next section, we describe the proposed methodology used to create a self-describing coupled modeling system and embed it in a scientific workflow application. The next section also introduces information about the workflow application, outlining the components and describing a prototype system that was constructed using the proposed methodology. In Section 3, we present detailed information about use case application and its components. In Sections 4 Results, 5 Conclusion and future work, we conclude and discuss possible future directions.

Section snippets

Methodology

This section gives a brief description of the proposed system beginning with the details of each component (such as metadata and provenance collection mechanisms) and leading to the creation of a self-describing coupled modeling system. Later on, the general structure of the system and its components are explained. The reader must also note that the example application defines two Earth subsystem models (indicated as Comp I and Comp II in Fig. 1) and a coupler component (shown as a Coupler in

Use case and components

Coupled modeling systems, in particular Global Circulation Models (GCMs) are widely used to study effects of climate change on a global scale. One of the most important problems that climate scientists are facing is that resolutions of GCM models are too coarse to assess the regional effects of climate change. To overcome this problem, regional climate models are used to downscale GCM model output to produce higher-resolution (both spatial and temporal) representations of regional effects.

Results

As mentioned in previous sections, the use of the ESMF library to couple WRF and ROMS models enables coupled modeling system with common component interfaces, to which additional Earth system models such as a Land Information System (LIS; Kumar et al., 2006) can be added easily. In this case, synchronization and data exchange among different model components are handled by the ESMF framework. Usage of XML technologies in the model configuration files might also help to design more generic and

Conclusion and future work

In this paper, we demonstrate the viability of using framework and workflow approaches together to create a self-describing modeling system with common component interfaces and execution environment that is specialized for Earth system related applications. The results show that the developed workflow environment facilitates integration of different components of the modeling system and also enables an easy to use and efficient work environment. The workflow system basically acts as an

Acknowledgments

The authors wish to thank John Michalakes and Qui Xin from NCAR for very useful suggestions and comments and Shaowu Bao from NOAA ESRL for sharing his model coupling experience and studies. The authors extend special thanks to Malden Vouk, Pierre Mouallem, Meiyappan Nagappan from North Caroline State University and Norbert Podhorszki, Scott Klasky from Oak Ridge National Laboratory for sharing their experience about collecting system provenance information. This work is funded by Istanbul

References (43)

  • S. Bowers et al.

    Actor-oriented design of scientific workflows

  • S. Bowers et al.

    A model for user-oriented data provenance in pipelined scientific work-flows

  • B. Cao et al.

    Provenance information model of karma version 3

  • N. Collins et al.

    Design and implementation of components in the Earth System Modeling Framework

    International Journal of High Performance Computing Applications

    (2005)
  • W.D. Collins et al.

    The community climate system model version 3 (CCSM3)

    Journal of Climate

    (2006)
  • D.P. Dee et al.

    The ERA-Interim reanalysis: configuration and performance of the data assimilation system

    Quarterly Journal of the Royal Meteorological Society

    (2011)
  • J. Eker et al.

    Taming heterogeneity – the Ptolemy approach

    Proceedings of the IEEE

    (2003)
  • J. Frew et al.

    Automatic capture and reconstruction of computational provenance

    Concurrency and Computation: Practice and Experience

    (2008)
  • G.A. Grell et al.

    A Description of the Fifth-generation Penn State/NCAR Mesoscale Model (MM5), NCAR/TN-398+STR

    (1995)
  • C. Hill et al.

    The architecture of the Earth System Modeling Framework

    Computing in Science and Engineering

    (2004)
  • C. Hill et al.

    Implementing Applications with the Earth System Modeling Framework. Lecture Notes in Computer Science 3732

    (2006)
  • Cited by (30)

    • Solar radiation modeling with KNIME and Solar Analyst: Increasing environmental model reproducibility using scientific workflows

      2020, Environmental Modelling and Software
      Citation Excerpt :

      In this work, that computational cost was managed by applying our machine learning of parameters to only a small, representative sample of building rooftops (about 300 pixels and 5% of the total study area). Although this work has focused on the specific case study of solar radiation modeling, our machine learning workflow approach has potential wider applicability for selecting appropriate parameter settings in any environmental model, such as aquatic ecosystem (Nielsen et al., 2017), hydrological (Pietroniro et al., 2007; Terink et al., 2015), and weather and climate (Mughal et al., 2017; Turuncoglu et al., 2013; Skamarock et al., 2008) models. For example, for aquatic ecosystem modeling, estimating and selecting appropriate values for meteorological parameters such as wind speed, air pressure, air temperature, and cloud cover fraction are similarly challenging to TPI and DPI in Solar Analyst.

    • Using the FACE-IT portal and workflow engine for operational food quality prediction and assessment: An application to mussel farms monitoring in the Bay of Napoli, Italy

      2020, Future Generation Computer Systems
      Citation Excerpt :

      Users (both field scientists and food quality/human health managers and experts) interact with the FACE-IT Galaxy data portal [2] in order to evaluate the ongoing situation, generate alerts and depict future scenarios for strategic management (http://www.faceit-portal.org). While the weather and the ocean circulation models wrapped as workflow tools are well known [3], widely used and community supported, the pollutant transport and dispersion model has been developed specifically for this application and designed to be integrated in the FACE-IT Galaxy workflow. The rest of this paper is as follows: Section 2 discusses the contextualization and motivation of the paper; Section 3 introduces the general FACE-IT infrastructure and how it has been developed in the context of agricultural modeling and food quality and extended in order to support the described application; Section 4 details the application workflow and how different models have been implemented to fit the proposed workflow infrastructure; Section 5 discusses computational and environmental issues as carried out from data analysis; Section 6 introduce the related work and finally Section 7 presents conclusions and proposes future work.

    • A system of metrics for the assessment and improvement of aquatic ecosystem models

      2020, Environmental Modelling and Software
      Citation Excerpt :

      All these data hold the potential to improve the way we run and assess environmental models. Indeed, aquatic ecosystems modellers are beginning to take up these data streams (Li et al., 2010; Johnson and Needoba, 2008; Turuncoglu et al., 2013) and it is timely to reconsider the ways we can use this data to improve our model formulations and to describe model uncertainty. Several recent commentaries have discussed the challenges and issues in application of complex environmental models in general (Nordstrom, 2012) and AEMs in particular (Robson, 2014a; Trolle et al., 2012; Arhonditsis et al., 2014; Frassl et al., 2019).

    View all citing articles on Scopus

    Thematic Issue on the Future of Integrated Modeling Science and Technology.

    View full text