Consensus methods based on machine learning techniques for marine phytoplankton presence–absence prediction

doi:10.1016/j.ecoinf.2017.09.004

Ecological Informatics

Volume 42, November 2017, Pages 46-54

https://doi.org/10.1016/j.ecoinf.2017.09.004 Get rights and content

Highlights

•
We present six non-homogeneous consensus models to predict the presence–absence of marine phytoplankton species.
•
In most of the cases, the consensus models behaved better than the single-models that were used to construct them.
•
The single-models considered were generalized linear models, random forests, boosting and support vector machines.
•
Our results suggest that attention must be given to consensus methods when dealing with ecological prediction.

Abstract

We performed different consensus methods by combining binary classifiers, mostly machine learning classifiers, with the aim to test their capability as predictive tools for the presence–absence of marine phytoplankton species. The consensus methods were constructed by considering a combination of four methods (i.e., generalized linear models, random forests, boosting and support vector machines). Six different consensus methods were analyzed by taking into account six different ways of combining single-model predictions. Some of these methods are presented here for the first time. To evaluate the performance of the models, we considered eight phytoplankton species presence–absence data sets and data related to environmental variables. Some of the analyzed species are toxic, whereas others provoke water discoloration, which can cause alarm in the population. Besides the phytoplankton data sets, we tested the models on 10 well-known open access data sets. We evaluated the models' performances over a test sample. For most (72%) of the data sets, a consensus method was the method with the lowest classification error. In particular, a consensus method that weighted single-model predictions in accordance with single-model performances (weighted average prediction error — WA-PE model) was the one that presented the lowest classification error most of the time. For the phytoplankton species, the errors of the WA-PE model were between 10% for the species Akashiwo sanguinea and 38% for Dinophysis acuminata. This study provides novel approaches to improve the prediction accuracy in species distribution studies and, in particular, in those concerning marine phytoplankton species.

Introduction

In the classification framework of machine learning (ML), ensemble methods or aggregating methods consist in combining the predictions of several classifiers (also called hypotheses or base classifiers) that are performed over the same data set. The predictions are combined with the main goal of reducing variance and constructing a more stable and accurate predictor James et al., 2014, Hastie et al., 2001, Bourel, 2012, Bourel, 2013. Ensemble methods have had great success not only in the ML community, but also among researchers from different fields and with statistical modeling interests, because of their accuracy, which is generally higher than that of individual classifiers (Polikar, 2006). Despite the merits of these methods, it is often a challenge to understand completely the theoretical framework behind them.

The strategy of combining the outputs of different classifiers implies that individual classifiers make errors on different instances. The logic is that, if each classifier makes different errors, then a good combination of these classifiers can reduce the total error, improving the errors of not-so-good classifiers. For this, it is interesting to make each classifier as unique as possible with respect to misclassified instances. In particular, it is necessary to find classifiers whose decision boundaries are adequately different from those of others. Such a set of classifiers is said to be diverse Polikar, 2006, Brown et al., 2005 and references therein). In general, however, ensemble algorithms do not attempt to maximize a specific diversity measure. Rather, increased diversity is usually sought somewhat heuristically through various resampling procedures, such as the selection (randomly or not) of different training parameters, models, or subsets of features.

Ensemble methods can be classified into two categories: homogeneous and non-homogeneous. Homogeneous methods combine classifiers of the same nature; examples of this type of methods are bagging (Breiman, 1996a), random forests (RF) (Breiman, 2001), and boosting Freund and Schapire, 1997, Schapire and Freund, 1998. In this paper, we will pay attention to non-homogeneous methods and we will refer to them as consensus methods. Consensus methods consist of a combination of various methods of a different nature. Examples of this type of methods are stacking Wolpert, 1992, Ting and Witten, 1999, Breiman, 1996b and mixture of experts (Masoudnia and Ebrahimpour, 2014). The different predictors are combined in some way; for instance, in the case of mixture of experts, this is done generally by averaging (with or without weights) or by voting over the models' predictions. In the case of stacking, the outputs of the different classifiers are used to train another classifier, which makes the final decision rule of the methods.

A way of doing a mixture of experts is inspired, to some extent, by Bayesian voting, and it consists in assigning a weight to each hypothesis (Kuncheva, 2014). A classifier h generally calculates the posterior probability that a given observation belongs to a class. To fix the notation, we can think that h computes a vector $(p_{0}^{h} (x), p_{1}^{h} (x))$ , where $p_{0}^{h} (x)$ and $p_{1}^{h} (x)$ are the posterior probabilities that observation x belongs to class 0 or to class 1, respectively. The consensus of different intermediate classifiers $h_{1}, \dots, h_{M}$ is to generate a classifier F of the form $F (x) = \underset{k \in {0, 1}}{Argmax} (\sum_{m = 1}^{M} w_{h_{m}, L} p_{k}^{h_{m}} (x)) .$

This type of combination is called a weighted averaging combining rule. In this paper, we will compare it empirically to other mixture-of-expert rules and to two versions of stacking.

Concerning the ecological modeling of species presence–absence, the performance of different statistical techniques could vary significantly from a particular case study to another, and it is not very clear sometimes which model is the most suitable. There are two possible strategies to reduce the models' uncertainty: (1) by acquiring an understanding via extensive model comparisons as to which method will generally provide the best predictive performance and in what conditions (Marmion et al., 2009b) and (2) by using consensus methods (i.e., non-homogeneous ensemble methods) Thuiller, 2004, Thuiller et al., 2005, Araújo and New, 2007, Marmion et al., 2009b. As mentioned earlier, consensus methods overcome the problem of variability in the predictions of different single models since they are based on the combination of their predictions. Hence, a relevant combination of several unbiased (i.e., with good accuracy) model outputs will result in a more accurate prediction.

The matter rests in choosing adequate single models and finding a relevant algorithm to combine them. When dealing with ecological problems, ML techniques seem to be good candidates for single models because of their predictive capacity (Olden and Jackson, 2002). These techniques are frequently and increasingly considered in ecological studies, in particular in modeling species presence–absence or abundance from environmental variables De’ath and Fabricius, 2000, Guisan et al., 2002, Drake et al., 2006, Cutler et al., 2007, Kampichler et al., 2010, Olden and Jackson, 2002. ML methods have advantages over traditional statistical methods (e.g., linear models and generalized linear models) since they can deal with some characteristics typical of ecological data such as unusual distributions, non-linearity, multiple missing values, complex data interactions, and dependence on the observations Guisan et al., 2002, Cutler et al., 2007, Crisci et al., 2012. Besides their flexibility, they typically outperform traditional approaches, making them ideal for modeling ecological systems (Olden et al., 2008). In fact, concerning ecological studies, ML methods are always considered when performing consensus models Marmion et al., 2009a, Marmion et al., 2009b, Lauzeral et al., 2015, Comte and Grenouillet, 2013, Thuiller et al., 2009. Besides ML techniques, more classical techniques such as generalized linear modeling or linear discriminant analysis are usually considered in the consensus construction Thuiller et al., 2009, Marmion et al., 2009a, Marmion et al., 2009b, Lauzeral et al., 2015, Comte and Grenouillet, 2013 since, in some cases (e.g., linear relations between the predictors and the response variable), these methods may outperform ML techniques.

It must be noted that, although the consensus approach clearly has a number of attractive characteristics, the understanding of its merits for ecological prediction is still limited (Marmion et al., 2009b); hence, further studies comparing the predictive capacity of consensus methods with that of single methods are needed. It must be noted also that most of the applications of consensus methods in ecological studies are related to the study of species distribution models (SDMs) (Guisan and Thuiller, 2005).

In this paper, we explore the performance of six different consensus methods for predicting the presence–absence of eight marine phytoplankton species from the Atlantic coast of Uruguay. Four of the methods are a mixture of experts, and the other two are stacking applications. Moreover, we analyze the performance of the consensus models by considering 10 well-known open access data sets. To generate the consensus, we combined four individual models with very different structures, three of which have been documented as some of the most accurate ML techniques: boosting, RF, and support vector machine (SVM), whereas the fourth is a generalized linear model (GLM) that could better capture the linear relationships in data. For a more detailed description of these models, we refer the reader to the Supplementary material.

Section snippets

Methods

In this section, we present i) the data sets used to evaluate the performance of the models; ii) the principal concepts of supervised classification, iii) a description of the consensus models analyzed in this work; iv) the way in which we calculated the prediction error of the models; and v) the model tuning and optimization, and the use of software and functions.

Models' performance

With all the data sets considered together, the WA-PE consensus method was the model that presented the lowest generalization error in most of the cases (9 cases out of 18) Fig. 3, Table 2, Table 3). MV and StackRF among the consensus methods, and RF among the single methods, were next to WA-PE in “number of wins” (two wins each) (Fig. 3a). Finally, the remaining methods presented the lowest generalization error only once (GLM, boosting, and SVM) or on no occasion at all (MeanProb, WA-AUC, and

Consensus models' performance

In this study, we applied six different consensus methods to predict the presence–absence of marine phytoplankton species. Furthermore, we evaluated the performance of the models using open access data sets.

To construct the consensus, we decided to combine three ML techniques that are well known in the ML community, present very good performance generally, and, at the same time, are well known and broadly used in ecological studies (e.g., Cutler et al., 2007, De’ath, 2007, Guo et al., 2005. It

Conclusions

Consensus methods present an interesting alternative for developing predictive tools to create sound monitoring and management tools. They have shown to produce favorable results compared to those by single methods (Polikar, 2006), although further applications in the ecology area must be addressed to determine the potential of these methods. In particular, further knowledge in the context of marine phytoplankton, and especially on species that represent challenges for water managers and

Acknowledgments

This work was supported by ECOS-Sud Aprendizaje Automático para la Modelización y el Análisis de Recursos Naturales (project n° U14E02) and by ANII-Uruguay.

References (74)

AndersonC.R. et al.
Predicting potentially toxigenic pseudo-Nitzschia blooms in the chesapeake Bay
J. Mar. Syst.
(2010)
AraújoM.B. et al.
Ensemble forecasting of species distributions
Trends Ecol. Evol.
(2007)
BergaminoL. et al.
Trophic niche shifts driven by phytoplankton in sandy beach ecosystems
Estuar. Coast. Shelf Sci.
(2016)
BrownG. et al.
Diversity creation methods: a survey and categorisation
IEEE Circuits Syst. Mag.
(2005)
CrisciC. et al.
A review of supervised machine learning algorithms and their applications to ecological data
Ecol. Model.
(2012)
FreundY. et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
(1997)
GuisanA. et al.
Generalized linear and generalized additive models in studies of species distributions: setting the scene
Ecol. Model.
(2002)
GuoQ. et al.
Support vector machines for predicting distribution of sudden oak death in California
Ecol. Model.
(2005)
JeongH.J. et al.
Red tides in Masan Bay, Korea in 2004-2005: i. Daily variations in the abundance of red-tide organisms and environmental factors
Harmful Algae
(2013)
JeongH.J. et al.
A hierarchy of conceptual models of red-tide generation: nutrition, behavior, and biological interactions
Harmful Algae
(2015)

JeongK.-S. et al.

Prediction and elucidation of phytoplankton dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network

Ecol. Model.

(2001)

KampichlerC. et al.

Classification in conservation biology: a comparison of five machine-learning methods

Eco. Inform.

(2010)

KimJ.H. et al.

Killing potential protist predators as a survival strategy of the newly described dinoflagellate Alexandrium pohangense

Harmful Algae

(2016)

LeeJ.H. et al.

Neural network modelling of coastal algal blooms

Ecol. Model.

(2003)

MarmionM. et al.

Statistical consensus methods for improving predictive geomorphology maps

Comput. Geosci.

(2009)

McGillicuddyD.

Models of harmful algal blooms: conceptual, empirical, and numerical approaches

J. Mar. Syst.

(2010)

MoisenG.G. et al.

Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

Ecol. Model.

(2006)

OdebrechtC. et al.

Surf zone diatoms: a review of the drivers, patterns and role in sandy beaches food chains

Estuar. Coast. Shelf Sci.

(2014)

RichardsonA. et al.

A dynamic quantitative approach for predicting the shape of phytoplankton profiles in the ocean

Prog. Oceanogr.

(2003)

ScardiM. et al.

Developing an empirical model of phytoplankton primary production: a neural network case study

Ecol. Model.

(1999)

WilsonH. et al.

Towards a generic artificial neural network model for dynamic predictions of algal abundance in freshwater lakes

Ecol. Model.

(2001)

WolpertD.

Stacked generalization

Neural Netw.

(1992)

AlexandreL.A. et al.

Combining independent and unbiased classifiers using weighted average

AndersenP. et al.

Estimating Cell Numbers

Manual on Harmful Marine Microalgae

(2003)

BourelM.

Model aggregation methods and applications

Mem. Trab. difusión Cient. Tec.

(2012)

BourelM.

Apprentissage statistique par aggregation de modeles

Ph.D Thesis Université Aix-Marseille, France

(2013)

BreimanL.

Bagging predictors

Mach. Learn.

(1996)

BreimanL.

Stacked regression

Mach. Learn.

(1996)

BreimanL.

Arcing classifiers

Ann. Stat.

(1998)

BreimanL.

Random forests

Mach. Learn.

(2001)

BrotonsL. et al.

Presence–absence versus presence-only modelling methods for predicting bird habitat suitability

Ecography

(2004)

CampbellE.E.

The global distribution of surf diatom accumulations

Rev. Chil. Hist. Nat.

(1996)

ComteL. et al.

Species distribution modelling and imperfect detection: comparing occupancy versus consensus methods

Divers. Distrib.

(2013)

CutlerD.R. et al.

Random forests for classification in ecology

Ecology

(2007)

De’athG.

Boosted trees for ecological modeling and prediction

Ecology

(2007)

De’athG. et al.

Classification and regression trees: a powerful yet simple technique for ecological data analysis

Ecology

(2000)

DevroyeL. et al.

A Probabilistic Theory of Pattern Recognition, volume 31 of Applications of Mathematics

(1997)

Cited by (22)

Prediction of algal bloom using a combination of sparse modeling and a machine learning algorithm: Automatic relevance determination and support vector machine
2023, Ecological Informatics
Algae can produce odor substances and toxins that make the smell and taste of water unpleasant and impair the quality of human and aquatic life. Appropriate countermeasures can be implemented in advance in water purification processes to prevent algal disorders if the occurrence of algal blooms is accurately predicted. Several models have been developed to predict algal blooms. However, a comprehensive model that can be universally applied under various conditions is lacking. In this study, automatic relevance determination, a sparse modeling algorithm, and support vector machine were combined to construct prediction models for algal blooms in four Japanese dam reservoirs to predict their occurrence over 7 days. Automatic relevance determination was applied to a dataset consisting of monthly water quality data and daily hydraulic and meteorological data to identify variables relevant to the concentrations of Microcystis spp. and Dolichospermum spp., which are bloom-forming cyanobacteria and are dominant in freshwater ecosystems. A dataset of selected variables was used to train and validate the support vector machine models. The results of variable selection by automatic relevance determination revealed that the average concentration of total nitrogen in the past year and the average maximum temperature in the past 7 days may have an association with the algal concentration. Support vector machine models resulted in 92.3% accuracy and 86.4% precision for Microcystis spp. and 71.4% accuracy and 77.5% precision for Dolichospermum spp. on average in binary classification. The competitive relationship between Microcystis spp. and Dolichospermum spp., which differs according to the nutrient level and temperature, probably affects the prediction performance of the models. Our study suggests that the combination of sparse modeling and machine learning is applicable to the construction of a prediction model for site-specific algal bloom events in dam reservoirs.
Marine chlorophyll-a prediction based on deep auto-encoded temporal convolutional network model
2023, Ocean Modelling
Chlorophyll-a concentration is one of the important indicators for assessing marine ecology. Accurate prediction of marine chlorophyll-a concentrations is a prerequisite for early warning of marine hazards such as red tide. There are many factors affecting chlorophyll-a concentration. Marine water quality parameter data as a time series is non-linear. There is a strong coupling relationship between various water quality parameters. This makes the prediction difficult. Therefore, there is a need to conduct research on marine chlorophyll-a concentration prediction methods. In this study, a depth-automatically encoded temporal convolutional network (DAE-TCN) is proposed to predict marine chlorophyll-a concentrations. Specifically, the DAE-TCN algorithm performs spatial feature extraction and data dimensionality reduction on the data through encoding to alleviate the difficulty of temporal feature extraction by temporal convolution, improve the efficiency of data information utilization, and obtain accurate prediction results. In this paper, we use the data from the coast of Beihai, Guangxi, as the experimental data. The DAE-TCN algorithm is compared with three baseline algorithms, and the results indicate that the algorithm shows advantages in stability and prediction accuracy and can be well applied to marine chlorophyll-a concentration prediction. This study has scientific significance for marine environmental protection.
Top-down and bottom-up control of phytoplankton in a mid-latitude continental shelf ecosystem
2023, Progress in Oceanography
It is widely recognized that ecosystems are influenced by both bottom-up (resource-driven) and top-down (predation-driven) controls. However, in the context of marine phytoplankton, there is often a strong emphasis on resource availability, with less attention given to the significant impact of top-down control on phytoplankton patterns. Studies in the Gulf of Cádiz have mainly focused on resource control perspectives, overlooking the role of zooplankton predation as a driving force in phytoplankton dynamics. This limitation in research approaches has resulted in an incomplete understanding of phytoplankton biomass patterns, as they cannot be fully explained by resource control alone. This study makes a first attempt at studying the combined bottom-up and top-down controls of phytoplankton in the Gulf of Cádiz and aims to test their relative influence on the variance of phytoplankton biomass. For this purpose, monthly samples of chlorophyll-a, mesozooplankton and environmental variables determining phytoplankton growth were taken over a six-year period. The variables involved in phytoplankton growth were compiled into a single factor (primary production limiting factor, pplf) representing the bottom-up control. Grazing (G) by mesozooplankton represented the top-down control of phytoplankton. Different statistical tools (multiple linear regression, decision trees and random forest) were used to analyse the relative importance of top-down vs. bottom-up control on the variance of phytoplankton biomass (using chlorophyll-a as a proxy). All three techniques found that the top-down control had more relative weight on the variance of phytoplankton biomass for the continental shelf as a whole. Resource availability (i.e., bottom-up control) determined phytoplankton growth rate, although grazing (i.e., top-down control) defined the maximum chlorophyll values in the studied region. Additionally, the results revealed a spatiotemporal pattern in the relative importance of top-down and bottom-up control on phytoplankton dynamics. These findings underscore the significance of considering the impact of top-down control for a comprehensive understanding of phytoplankton dynamics in the Gulf of Cádiz.
Potentialities and limitations of machine learning to solve cut-and-shuffle mixing problems: A case study
2022, Chemical Engineering Science
Citation Excerpt :
Although the MSE loss is very small across NN realizations (Fig. 6) and across training datasets (Fig. 8), the mixing performance for extrapolation in terms of the number of interfaces is more sensitive for both [Figs. 10(b) and (c), respectively]. The variability across realizations of networks trained on the same training data can be reduced by averaging NN outputs between realizations (Bourel et al., 2017) to reduce prediction error in cut locations, as shown in Fig. 10(c). Furthermore, the larger training datasets increase the number of interfaces [Fig. 10(c)], even though training did not use this metric.
Cut-and-shuffle mixing is an instructive candidate system with which to assess the potential of machine learning (ML) as an approach to solve difficult mixing problems. We focus on a specific subset of cut-and-shuffle systems, the one-dimensional interval exchange transform. This class of mixing operations is well studied, and a simple mixing methodology, which we refer to as the longest segment (LS) method, works well under a broad range of situations. We use supervised learning to train a neural network (NN) to emulate the LS mixing algorithm for mixing a one-dimensional domain of two species. We find that a generic deep NN can emulate the LS method with good accuracy but cannot generalize to conditions significantly outside its training repertoire. The challenges in defining the mixing problem and generalizing a ML mixing approach are indicative of those expected for more complex systems where optimal or near optimal mixing methods remain unknown.
Automation of species-specific cyanobacteria phycocyanin fluorescence compensation using machine learning classification
2022, Ecological Informatics
Citation Excerpt :
In some cases, individual species will dominate the in-situ cyanoHAB community (Rousso et al., 2022b; Soares et al., 2013; Wang et al., 2010), while in others multiple species will coexist within the cyanoHAB (Gallego et al., 2019; Tromas et al., 2017; Zhang et al., 2021). For the latter, grouping of species by functional groups (e.g., morphological characteristics such as cell size and colony structures; adaptive physiological features such as diazotrophy and buoyancy regulation; see Reynolds, 2000 for details of functional groups classifications) can be performed (Bourel et al., 2017; Crisci et al., 2017; Shimoda et al., 2016), as similar species may co-exist due to niche and fitness similarities (Gallego et al., 2019). Tailored laboratory experiments should be designed to encompass either mono-specific cultures or multi-species cultures that often co-exist at the site of interest (see Rousso et al., 2022a).
High-frequency cyanobacteria monitoring often uses in-situ fluorescence of phycocyanin (f-PC). However, f-PC must be calibrated for the dominant cyanobacteria species, and it cannot distinguish cyanobacteria taxa, which relies on conventional time-consuming cyanobacteria identification methods. This study proposes a framework to automate f-PC species-specific compensation through three components: (1) prediction of the dominant cyanobacteria species using data-driven models and routine environmental monitoring data; (2) determination of species-specific f-PC per biomass in controlled laboratory experiments; and (3) automation of f-PC species compensation. The framework was validated by applying it to Myponga drinking water reservoir in South Australia. Three machine learning techniques using only high-frequency water temperature data were compared to predict the dominant cyanobacteria species. The framework application to Myponga drinking water reservoir improved the agreement of f-PC with conventional cyanobacteria biovolume measurements, and provided rapid, low-cost identification of the dominant cyanobacteria species, which can support proactive species-targeted cyanobacteria management.
Machine learning methods for imbalanced data set for prediction of faecal contamination in beach waters
2021, Water Research
Citation Excerpt :
For example, Random forest (Jones et al., 2013; Parkhurst et al., 2005), Artificial Neural networks (Choi and Seo, 2018; He and He, 2008; Kashefipour et al., 2005), Bayesian networks (Avila et al., 2018) and Wavelet analysis (Ge and Frick, 2009; Zhang et al., 2018) were successfully applied to model water quality. Metalearning models, such as stacking and consensus methods, where the outputs of individual ML models are taken as input for another model that produces the final prediction are beginning to be applied in the field (Bourel et al., 2017; Wang et al., 2021). Advances in soft computing techniques has widespread the use of artificial neural network and support vector machine in the field of environmental engineering (Haghiabi et al., 2018b) and water quality studies (Haghiabi, 2016).
Predicting water contamination by statistical models is a useful tool to manage health risk in recreational beaches. Extreme contamination events, i.e. those exceeding normative are generally rare with respect to bathing conditions and thus the data is said to be imbalanced. Modeling and predicting those rare events present unique challenges. Here we introduce and evaluate several machine learning techniques and metrics to model imbalanced data and evaluate model performance. We do so by using a) simulated data-sets and b) a real data base with records of faecal coliform abundance monitored for 10 years in 21 recreational beaches in Uruguay (N $\approx$ 19000) using in situ and meteorological variables. We discuss advantages and disadvantages of the methods and provide a simple guide to perform models for a general audience. We also provide R codes to reproduce model fitting and testing. We found that most Machine Learning techniques are sensitive to imbalance and require specific data pre-treatment (e.g. upsampling) to improve performance. Accuracy (i.e. correctly classified cases over total cases) is not adequate to evaluate model performance on imbalanced data set. Instead, true positive rates (TPR) and false positive rates (FPR) are recommended. Among the 52 possible candidate algorithms tested, the stratified Random forest presented the better performance improving TPR in 50% with respect to baseline (0.4) and outperformed baseline in the evaluated metrics. Support vector machines combined with upsampling method or synthetic minority oversampling technique (SMOTE) performed well, similar to Adaboost with SMOTE. These results suggests that combining modeling strategies is necessary to improve our capacity to anticipate water contamination and avoid health risk.

View all citing articles on Scopus

View full text

Consensus methods based on machine learning techniques for marine phytoplankton presence–absence prediction

Highlights

Abstract

Introduction

Section snippets

Methods

Models' performance

Consensus models' performance

Conclusions

Acknowledgments

J. Mar. Syst.

Trends Ecol. Evol.

Estuar. Coast. Shelf Sci.

IEEE Circuits Syst. Mag.

Ecol. Model.

J. Comput. Syst. Sci.

Ecol. Model.

Ecol. Model.

Harmful Algae

Harmful Algae

Ecol. Model.

Eco. Inform.

Harmful Algae

Ecol. Model.

Comput. Geosci.

J. Mar. Syst.

Ecol. Model.

Estuar. Coast. Shelf Sci.

Prog. Oceanogr.

Ecol. Model.

Ecol. Model.

Neural Netw.

Combining independent and unbiased classifiers using weighted average

Estimating Cell Numbers

Manual on Harmful Marine Microalgae

Model aggregation methods and applications

Mem. Trab. difusión Cient. Tec.

Apprentissage statistique par aggregation de modeles

Ph.D Thesis Université Aix-Marseille, France

Bagging predictors

Mach. Learn.

Stacked regression

Mach. Learn.

Arcing classifiers

Ann. Stat.

Random forests

Mach. Learn.

Presence–absence versus presence-only modelling methods for predicting bird habitat suitability

Ecography

The global distribution of surf diatom accumulations

Rev. Chil. Hist. Nat.

Species distribution modelling and imperfect detection: comparing occupancy versus consensus methods

Divers. Distrib.

Random forests for classification in ecology

Ecology

Boosted trees for ecological modeling and prediction

Ecology

Classification and regression trees: a powerful yet simple technique for ecological data analysis

Ecology

A Probabilistic Theory of Pattern Recognition, volume 31 of Applications of Mathematics