Evaluating machine-learning techniques for recruitment forecasting of seven North East Atlantic fish species
Introduction
Early on in fisheries research, recruitment was identified as a key element in management. As a result, recruitment and the factors determining it have been the subject of intense research (e.g. Cushing, 1971, Myers et al., 1995, Ricker, 1954, Rothschild, 2000). Such research has evolved from considering only the biomass of spawners, to including also environmental factors that can modulate recruitment (e.g. Planque and Buffaz, 2008, Schirripa and Colbert, 2006). The main limitation to achieve good forecasts, from a data analysis perspective is the sparse and ‘noisy’ nature of the available data (Fernandes et al., 2010, Francis, 2006).
A further problem is that data about some of the factors that can be controlling recruitment directly (e.g. food availability, larval growth), may be more laborious to obtain, than the recruitment estimate itself (Irigoien et al., 2009, Zarauz et al., 2008, Zarauz et al., 2009). Based on a simplified approach, fisheries management has been moving towards the use of environmental relationships using oceanographic data. These are collected routinely, as proxies of recruitment conditions (Bartolino et al., 2008, Borja et al., 2008, De Oliveira et al., 2005). Nevertheless, the problem remains difficult because the mechanisms behind such relationships are often poorly understood; this in turn, makes it difficult to determine the forecast estimation robustness, leading to the failure of some proposed relationships, methods and performance estimations, when new data became available (Myers et al., 1995). Such failures may be related to new controls, which were not considered previously (Myers et al., 1995, Planque and Buffaz, 2008), or to limitations in the available data (Schirripa and Colbert, 2006).
Recruitment forecast is a problem of high uncertainty (Mäntyniemi et al., in press). Machine-learning techniques have been proposed as an appropriate approach with some desirable properties to address such problems (Dreyfus-León and Chen, 2007, Dreyfus-León and Schweigert, 2008, Fernandes et al., 2010, Fernandes et al., 2013, Uusitalo, 2007). In this study, an update of a previously proposed machine-learning based framework (Fernandes et al., 2010) is applied to several North Atlantic species of commercial interest, which share spawning and nursing environment in the shelf break (Ibaibarriaga et al., 2007, Sagarminaga and Arrizabalaga, 2010). The main properties of this methodology are: (i) forecasts with its uncertainty estimated; (ii) forecasts and scenarios easy to interpret; (iii) recruitment and factors boundaries, that can be interpreted easily; (iv) high stability of selected factors, using a ‘leaving one out’ schema; (v) error balanced through all recruitment level; and (vi) robust, as well as honest performance estimation.
Within this context, this work has three aims: to identify factors for forecasting of North Atlantic species that share spawning and nursing area; (ii) to propose a novel model to modify the previous framework in order to produce more accurate probabilistic forecasts; and (iii) to provide a comparison between goodness-of-fit and generalization power, in order to assess the reliability of the final forecasting models. This comparison is necessary since the used methods are non-parametric and might over-fit the data. The three objectives are crucial to produce reliable forecasts that can be used for decision taking in fisheries management of those species that share spawning and nursing area.
Section snippets
Target species
The species recruitment time series analysed for the North East Atlantic that share the shelf break as spawning and nursing area are summarized below: 1) The anchovy recruitment mixed time-series (ARM) is a combination of two anchovy recruitment time-series; the long anchovy recruitment index time-series (ARI; Borja et al., 1996) established from the percentage of age 1 in the landings (40 years) and the Anchovy Recruitment (AR; ICES, 2008a; 23 years). The resulting time-series contains 45 years
Pipeline comparison
The missing imputation can also be applied to the ‘NBC-Pipeline’; however, no significant improvement was observed. This result was expected since NBC can be learned with missing data and there was no factor with high levels of missing values.
Both classifiers, NB and FNB classifiers, show good-fit for most of the considered species (Fig. 1). The ‘MIS + FNB-Pipeline’ produces the best fitting for the seven species (Table 2). The most interesting property of this fitting for fisheries management is
Discussion
The main contribution of this work is the application of the methodology developed in Fernandes et al. (2010), to a broad set of species using a global set of variables. The forecast estimates of each species can be improved by applying more specific knowledge (more specific environmental data), to each species. However, the results show that, even using a global approach, useful information can be obtained using machine learning techniques applied to the recruitment forecasting problem. The
Acknowledgements
The research of Jose A. Fernandes and Nerea Goikoetxea is supported by a Doctoral Fellowship from the Fundación Centros Tecnológicos Iñaki Goenaga. This study has been supported by the following projects: Ecoanchoa (funded by the Department of Agriculture, Fisheries and Food of the Basque Country Government); the Saiotek and Research Groups 2007–2012 (IT-242-07) programs (Basque Government), TIN2008-06815-C02-01 (Spanish Ministry of Education and Science); COMBIOMED network in computational
References (48)
- et al.
Modelling recruitment dynamics of hake, Merluccius merluccius, in the central Mediterranean in relation to key environmental variables
Fish. Resh.
(2008) Theory refinement on Bayesian networks
- et al.
Potential improvements in the management of Bay of Biscay anchovy by incorporating environmental indices as recruitment predictors
Fish. Res.
(2005) - et al.
Recruitment prediction with genetic algorithms with application to the Pacific Herring fishery
Ecol. Model.
(2007) - et al.
Recruitment prediction for Pacific herring (Clupea pallasi) on the west coast of Vancouver Island, Canada
Ecol. Inf.
(2008) - et al.
Fish recruitment prediction, using robust supervised classification methods
Ecol. Model.
(2010) - et al.
Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting
Environ. Model Softw.
(2013) - et al.
Bayesian classifiers based on kernel density estimation: flexible classifiers
Int. J. Approx. Reason.
(2009) Advantages and challenges of Bayesian networks in environmental modelling
Ecol. Model.
(2007)- et al.
Relationship between anchovy (Engraulis encrasicholus) recruitment and the environment in the Bay of Biscay
Sci. Mar.
(1996)