Automated discovery of a model for dinoflagellate dynamics
Research highlights
► Automated modelling method was applied to discover Dinoflagellate model in lake Kinneret. ► The method integrates theoretical and empirical modelling approach. ► The discovered model is in line with the theoretical background knowledge. ► The model successfully simulates the Dinoflagelate over a period of 7 years
Introduction
Lake Kinneret is the only natural freshwater lake in Israel, not only providing some 30% of the country’s drinking water but also serving as a key recreational site. Routine monitoring of the lake ecosystem has been conducted since 1969 and until 1993 the ecosystem exhibited noticeable stability in its key characteristics (Berman et al., 1995). One key element was the annual spring bloom of the large dinoflagellate, Peridinium gatunense (Zohary, 2004). P. gatunense, through to the early 1990s was the dominant algal species in the lake often reaching over 90% of the algal biomass. Since the mid-1990s, however, a change has occurred in the lake leading to the appearance of cyanobacterial blooms and years in which no Peridinium bloom was detected (Zohary, 2004). Hence, unlike the period prior to 1994, it is no longer possible to predict which species will bloom or when. The reasons for the shift in algal succession are still unclear as are the reasons that have led to observed large inter-annual variation in Peridinium biomass (Roelke et al., 2007).
Modelling of such a system can be very useful but also a very difficult task. Mechanistic ecological model, such as DYRESM-CAEDYM have been applied to simulate the seasonal dynamics of nutrients, and multiple phytoplankton and zooplankton groups (Bruce et al., 2006). The model is fairly complex and, as most mechanistic ecological models, structurally fixed and cannot react to changes in the ecosystem structure.
Complex models are difficult to cope with, in particular when attempting to perform long-term simulations. On the other hand, no matter how complex the model is, it is still a rough simplification of the reality. This is incorporated in the model’s parameters. Estimating the values of the model’s parameters is another, even more difficult task since most of the time we are dealing with over-parameterised models. This means that we do not have a single (unique) set of parameters’ values. There are therefore clear advantages to simplifying mathematical models as much as possible. In this context, Jakeman et al. (2006) suggested a 10-step procedure for the development of good models. Applications of this procedure can be found in Robson et al. (2008) and in Welsh (2008). Crout et al. (2009) suggest a methodology for model simplification (reduction of complexity), which is an extension to the previously suggested one by Cox et al. (2006). Applying this methodology on real examples of existing process based models demonstrated that all of the models could be used in their reduced versions.
In this paper we are not dealing with simplification of existing models but rather with the direct discovery of the optimal (already simplified) model structure. To do so, we apply an automated knowledge discovery tool, called Lagramge (Džeroski and Todorovski, 2003) in order to determine the optimum model structure that can be used for conceptual modelling of the Peridinium population. Lagramge is a machine learning (ML, hereafter) algorithm based on equation discovery. But unlike the majority of the ML algorithms that are purely data-driven, i.e. they induce models from measured data only, Lagramge is capable of including background expert knowledge in the procedure of model induction from data. Domain knowledge is introduced in the form of generic processes. Todorovski (2003) developed formalism for encoding process based domain knowledge into a modelling library. Using the developed formalism, Atanasova et al. (2006a) elaborated a knowledge library for modelling of lake ecosystems. Using the library and the automated modelling method, Lagramge has constructed models of several real-world domains, e.g. Lake Bled, Slovenia (Atanasova et al., 2006b), Lake Kasumigaura, Japan (Atanasova et al., 2006c), Lake Glumsø, Denmark (Atanasova et al., 2008), and Lagoon of Venice, Italy (Atanasova, 2005).
The goal of this research is to discover an optimal model equation for Peridinium dynamics, which can (a) provide some indications as to which factors should be included in our models, and how to include them, i.e. suggest the necessary model complexity, and (b) suggest whether a single model structure (and set of parameters’ values) is sufficient to model the Peridinium dynamics prior to, and after, the algal shift in the lake. In order to address these issues we model the Peridinium population prior to, and following, the changes that occurred in the lake in the early 1990s. Specifically, we compare the periods 1988–1992 and 1997–1999.
The remainder of this paper is organised as follows. In the next section we explain the automated modelling method Lagramge used for model discovery from measured data and the modelling background knowledge. Section 3 describes the Lake Kinneret dataset. Experimental setup for Peridinium model discovery is given in Section 4, followed by results and discussion in Sections 5 Results, 6 Discussion. Finally, the conclusions are presented in Section 7.
Section snippets
Automated modelling with Lagramge
In this section, we present the method Lagramge, used for automated model discovery from both, measured data and background knowledge. We first explain the motivation for using Lagramge compared to the theoretical (or conceptual) approach to modelling, than we explain how Lagramge works, i.e., how it introduces the background knowledge (stored in a modelling knowledge library) into the procedure for model discovery from data, and finally we present a segment of the modelling knowledge library
Lake Kinneret dataset
The data used were collected by the Kinneret Limnological Laboratory (KLL) staff and extracted from the Laboratory’s database (Kinneret Limnological Laboratory, 2001). The lake has been studied since 1969 and a wide range of physical, biological and chemical variables has been monitored routinely ever since (Berman et al., 1995). Data used included hydrological information such as inflow volumes and nutrient loading (ammonium – NH4, nitrate – NO3, dissolved organic nitrogen – DON, total
Experiments
In accordance with our goals, i.e., to discover a good model for Peridinium dynamics we formulated three experiments that correspond to three different modelling tasks for Lagramge. Our general approach was to discover a structure that is in line with existing theoretical modelling knowledge and expert knowledge about this particular case study. Thus, our general structure of the model to be discovered is presented in Eq. (6).
The task at hand is to find
Results
In this section we present the optimal Peridinium model resulting from the induction and the evaluation procedure, described in Section 4. Given the nine data (sub)sets and the three modelling tasks Lagramge returned 27 sets of models, one for each pair of dataset and modelling task. More specifically, given the modelling knowledge library and the modelling task specifications, the space of candidate models includes 44,352 model structures for the modelling task 1, 12,960 structures for the
Discussion
In this research, we employ an automated modelling (AM) method that combines modelling knowledge and measured data to find an optimal model in a given modelling scenario. We further combine the output of the tool with a domain expert’s evaluation of models to find a model of Peridinium dynamics in Lake Kinneret. In terms of integrated domain knowledge in the modelling process, the presented work is related to several other studies.
Whigham (1995) proposed an introduction of the domain knowledge
Conclusions
In this paper we successfully discovered a long-term model for Peridinium dynamics in Lake Kinneret. This is of special importance, since the lake has undergone ecosystem changes, causing a shift in algal succession. The previously stable ecosystem with regular blooms of Peridinium has changed into an unstable ecosystem with reoccurring Cyanobacterial blooms. In contrast to the expectations that one model structure could not describe the Peridinium dynamics before and after the change, we
References (37)
- et al.
Constructing a library of domain knowledge for automated modelling of aquatic ecosystems
Ecological Modelling
(2006) - et al.
Automated modelling of a food web in lake Bled using measured data and a library of domain knowledge
Ecological Modelling
(2006) - et al.
Application of automated model discovery from data and expert knowledge to a real-world domain: Lake Glumsø
Ecological Modelling
(2008) - et al.
A numerical simulation of the role of zooplankton in C, N and P cycling in Lake Kinneret, Israel
Ecological Modelling
(2006) - et al.
Towards the systematic simplification of mechanistic models
Ecological Modelling
(2006) - et al.
Is my model too complex? Evaluating model formulation using model reduction
Environmental Modelling and Software
(2009) - et al.
Learning population dynamics models from data and domain knowledge
Ecological Modelling
(2003) - et al.
Implementation of ecological modeling as an effective management and investigation tool: Lake Kinneret as a case study
Ecological Modelling
(2009) - et al.
Ten iterative steps in development and evaluation of environmental models
Environmental Modelling and Software
(2006) A simulation model for phytoplankton growth and nutrient cycling in eutrophic, shallow lakes
Ecological Modelling
(1978)
Ten steps applied to development and evaluation of process-based biogeochemical models of estuaries
Environmental Modelling and Software
Documentation of selected constructs and parameter values in the aquatic model cleaner
Ecological Modelling
An ecological model of Lake Ontario
Ecological Modelling
Programmed cell death of the dinoflagellate Peridinium gatunense is mediated by CO2 limitation and oxidative stress
Current Biology
Water balance modelling in Bowen, Queensland, and the ten iterative steps in model development and evaluation
Environmental Modelling and Software
Computational assemblage of ordinary differential equations for chlorophyll-a using a lake process equation library and measured data of Lake Kasumigaura
A contribution to the phosphorus loading concept
Internationale Revue der gesamten Hydrobiologie und Hydrographie
Cited by (7)
Development of a knowledge library for automated watershed modeling
2014, Environmental Modelling and SoftwareCitation Excerpt :Such formalization of modeling knowledge allows an AM tool to search through the space of all possible combinations of components. A similar formalism supported by the AM tool Lagramge (Džeroski and Todorovski, 2003) was used for constructing an aquatic ecosystems modeling library (Atanasova et al., 2006) and was successfully used for several ecological modeling tasks (Atanasova et al., 2008, 2011). Like the ProBMoT formalism, the Lagramge formalism also provides hierarchical knowledge representation.
Selecting among five common modelling approaches for integrated environmental assessment and management
2013, Environmental Modelling and SoftwareCitation Excerpt :Non-spatial models do not make reference to space. For example a predator-prey model may not refer to any particular spatial scale (Atanasova et al., 2011; Ramos-Jiliberto, 2005). Lumped spatial models provide a single set of outputs (and calculate internal states) for the entire area modelled.
Learning ordinary differential equations for macroeconomic modelling
2015, Proceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015The pelagic food web
2014, Aquatic EcologyModeling the kinneret ecosystem
2014, Aquatic Ecology