Automated discovery of a model for dinoflagellate dynamics

https://doi.org/10.1016/j.envsoft.2010.11.003Get rights and content

Abstract

The aim of this paper is to discover a model equation for predicting the concentration of the algal species Peridinium gatunense (Dinoflagellate) in Lake Kinneret. This is a rather difficult task, due to the sudden ecosystem changes that occurred in the mid-1990s. Namely, the stable ecosystem (with regular Peridinium blooms until 1993) underwent changes and has transformed into an unstable system, with cyanobacterial blooms now occurring regularly. This shift in the algal succession is expected to influence attempts to model the lake ecosystem. Namely, the model structure before and after the change is likely to be different. Our modelling experiments were directed to discover a single model equation that can simulate dinoflagellate dynamics in both periods. We apply an automated modelling tool (Lagramge), which integrates the knowledge- and the data-driven modelling approach. In addition we include an expert visual estimation of the models discovered by Lagramge to assist in the selection of the optimal model. The dataset used included time-series measurements of typical data from the periods 1988 to 1992 and 1997 to 1999. Using the data and expert knowledge coded in a modelling knowledge library, Lagramge successfully discovered several suitable mathematical models for Peridinium. After the expert’s visual estimation and validation of the models, we propose one optimal model capable of long-term predictions.

Research highlights

► Automated modelling method was applied to discover Dinoflagellate model in lake Kinneret. ► The method integrates theoretical and empirical modelling approach. ► The discovered model is in line with the theoretical background knowledge. ► The model successfully simulates the Dinoflagelate over a period of 7 years

Introduction

Lake Kinneret is the only natural freshwater lake in Israel, not only providing some 30% of the country’s drinking water but also serving as a key recreational site. Routine monitoring of the lake ecosystem has been conducted since 1969 and until 1993 the ecosystem exhibited noticeable stability in its key characteristics (Berman et al., 1995). One key element was the annual spring bloom of the large dinoflagellate, Peridinium gatunense (Zohary, 2004). P. gatunense, through to the early 1990s was the dominant algal species in the lake often reaching over 90% of the algal biomass. Since the mid-1990s, however, a change has occurred in the lake leading to the appearance of cyanobacterial blooms and years in which no Peridinium bloom was detected (Zohary, 2004). Hence, unlike the period prior to 1994, it is no longer possible to predict which species will bloom or when. The reasons for the shift in algal succession are still unclear as are the reasons that have led to observed large inter-annual variation in Peridinium biomass (Roelke et al., 2007).

Modelling of such a system can be very useful but also a very difficult task. Mechanistic ecological model, such as DYRESM-CAEDYM have been applied to simulate the seasonal dynamics of nutrients, and multiple phytoplankton and zooplankton groups (Bruce et al., 2006). The model is fairly complex and, as most mechanistic ecological models, structurally fixed and cannot react to changes in the ecosystem structure.

Complex models are difficult to cope with, in particular when attempting to perform long-term simulations. On the other hand, no matter how complex the model is, it is still a rough simplification of the reality. This is incorporated in the model’s parameters. Estimating the values of the model’s parameters is another, even more difficult task since most of the time we are dealing with over-parameterised models. This means that we do not have a single (unique) set of parameters’ values. There are therefore clear advantages to simplifying mathematical models as much as possible. In this context, Jakeman et al. (2006) suggested a 10-step procedure for the development of good models. Applications of this procedure can be found in Robson et al. (2008) and in Welsh (2008). Crout et al. (2009) suggest a methodology for model simplification (reduction of complexity), which is an extension to the previously suggested one by Cox et al. (2006). Applying this methodology on real examples of existing process based models demonstrated that all of the models could be used in their reduced versions.

In this paper we are not dealing with simplification of existing models but rather with the direct discovery of the optimal (already simplified) model structure. To do so, we apply an automated knowledge discovery tool, called Lagramge (Džeroski and Todorovski, 2003) in order to determine the optimum model structure that can be used for conceptual modelling of the Peridinium population. Lagramge is a machine learning (ML, hereafter) algorithm based on equation discovery. But unlike the majority of the ML algorithms that are purely data-driven, i.e. they induce models from measured data only, Lagramge is capable of including background expert knowledge in the procedure of model induction from data. Domain knowledge is introduced in the form of generic processes. Todorovski (2003) developed formalism for encoding process based domain knowledge into a modelling library. Using the developed formalism, Atanasova et al. (2006a) elaborated a knowledge library for modelling of lake ecosystems. Using the library and the automated modelling method, Lagramge has constructed models of several real-world domains, e.g. Lake Bled, Slovenia (Atanasova et al., 2006b), Lake Kasumigaura, Japan (Atanasova et al., 2006c), Lake Glumsø, Denmark (Atanasova et al., 2008), and Lagoon of Venice, Italy (Atanasova, 2005).

The goal of this research is to discover an optimal model equation for Peridinium dynamics, which can (a) provide some indications as to which factors should be included in our models, and how to include them, i.e. suggest the necessary model complexity, and (b) suggest whether a single model structure (and set of parameters’ values) is sufficient to model the Peridinium dynamics prior to, and after, the algal shift in the lake. In order to address these issues we model the Peridinium population prior to, and following, the changes that occurred in the lake in the early 1990s. Specifically, we compare the periods 1988–1992 and 1997–1999.

The remainder of this paper is organised as follows. In the next section we explain the automated modelling method Lagramge used for model discovery from measured data and the modelling background knowledge. Section 3 describes the Lake Kinneret dataset. Experimental setup for Peridinium model discovery is given in Section 4, followed by results and discussion in Sections 5 Results, 6 Discussion. Finally, the conclusions are presented in Section 7.

Section snippets

Automated modelling with Lagramge

In this section, we present the method Lagramge, used for automated model discovery from both, measured data and background knowledge. We first explain the motivation for using Lagramge compared to the theoretical (or conceptual) approach to modelling, than we explain how Lagramge works, i.e., how it introduces the background knowledge (stored in a modelling knowledge library) into the procedure for model discovery from data, and finally we present a segment of the modelling knowledge library

Lake Kinneret dataset

The data used were collected by the Kinneret Limnological Laboratory (KLL) staff and extracted from the Laboratory’s database (Kinneret Limnological Laboratory, 2001). The lake has been studied since 1969 and a wide range of physical, biological and chemical variables has been monitored routinely ever since (Berman et al., 1995). Data used included hydrological information such as inflow volumes and nutrient loading (ammonium – NH4, nitrate – NO3, dissolved organic nitrogen – DON, total

Experiments

In accordance with our goals, i.e., to discover a good model for Peridinium dynamics we formulated three experiments that correspond to three different modelling tasks for Lagramge. Our general approach was to discover a structure that is in line with existing theoretical modelling knowledge and expert knowledge about this particular case study. Thus, our general structure of the model to be discovered is presented in Eq. (6).ppt=growthrespirationmortality

The task at hand is to find

Results

In this section we present the optimal Peridinium model resulting from the induction and the evaluation procedure, described in Section 4. Given the nine data (sub)sets and the three modelling tasks Lagramge returned 27 sets of models, one for each pair of dataset and modelling task. More specifically, given the modelling knowledge library and the modelling task specifications, the space of candidate models includes 44,352 model structures for the modelling task 1, 12,960 structures for the

Discussion

In this research, we employ an automated modelling (AM) method that combines modelling knowledge and measured data to find an optimal model in a given modelling scenario. We further combine the output of the tool with a domain expert’s evaluation of models to find a model of Peridinium dynamics in Lake Kinneret. In terms of integrated domain knowledge in the modelling process, the presented work is related to several other studies.

Whigham (1995) proposed an introduction of the domain knowledge

Conclusions

In this paper we successfully discovered a long-term model for Peridinium dynamics in Lake Kinneret. This is of special importance, since the lake has undergone ecosystem changes, causing a shift in algal succession. The previously stable ecosystem with regular blooms of Peridinium has changed into an unstable ecosystem with reoccurring Cyanobacterial blooms. In contrast to the expectations that one model structure could not describe the Peridinium dynamics before and after the change, we

References (37)

  • B.J. Robson et al.

    Ten steps applied to development and evaluation of process-based biogeochemical models of estuaries

    Environmental Modelling and Software

    (2008)
  • D. Scavia et al.

    Documentation of selected constructs and parameter values in the aquatic model cleaner

    Ecological Modelling

    (1976)
  • D. Scavia

    An ecological model of Lake Ontario

    Ecological Modelling

    (1980)
  • A. Vardi et al.

    Programmed cell death of the dinoflagellate Peridinium gatunense is mediated by CO2 limitation and oxidative stress

    Current Biology

    (1999)
  • W.D. Welsh

    Water balance modelling in Bowen, Queensland, and the ten iterative steps in model development and evaluation

    Environmental Modelling and Software

    (2008)
  • Atanasova, N., 2005. Preparation and use of the domain expert knowledge for automated modelling of aquatic ecosystems....
  • N. Atanasova et al.

    Computational assemblage of ordinary differential equations for chlorophyll-a using a lake process equation library and measured data of Lake Kasumigaura

  • J. Bendorf

    A contribution to the phosphorus loading concept

    Internationale Revue der gesamten Hydrobiologie und Hydrographie

    (1979)
  • Cited by (7)

    • Development of a knowledge library for automated watershed modeling

      2014, Environmental Modelling and Software
      Citation Excerpt :

      Such formalization of modeling knowledge allows an AM tool to search through the space of all possible combinations of components. A similar formalism supported by the AM tool Lagramge (Džeroski and Todorovski, 2003) was used for constructing an aquatic ecosystems modeling library (Atanasova et al., 2006) and was successfully used for several ecological modeling tasks (Atanasova et al., 2008, 2011). Like the ProBMoT formalism, the Lagramge formalism also provides hierarchical knowledge representation.

    • Selecting among five common modelling approaches for integrated environmental assessment and management

      2013, Environmental Modelling and Software
      Citation Excerpt :

      Non-spatial models do not make reference to space. For example a predator-prey model may not refer to any particular spatial scale (Atanasova et al., 2011; Ramos-Jiliberto, 2005). Lumped spatial models provide a single set of outputs (and calculate internal states) for the entire area modelled.

    • Learning ordinary differential equations for macroeconomic modelling

      2015, Proceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015
    • The pelagic food web

      2014, Aquatic Ecology
    • Modeling the kinneret ecosystem

      2014, Aquatic Ecology
    View all citing articles on Scopus
    View full text