Advanced turbidity prediction for operational water supply planning
Introduction
Turbidity can be defined as the “optical quality [of water] that causes light to be scattered and absorbed rather than transmitted in straight lines through a sample” [1, p.200]. It can also be understood to be “the cloudiness of water caused by suspended particles such as clay and silts, chemical precipitates such as manganese and iron, and organic particles such as plant debris and organisms” [2, p.3].
While turbidity itself does not present a hazard to human health, it can be an indication of poor water quality. Furthermore, high levels of turbidity present during the treatment of raw water can limit the effectiveness of filtration and chlorination processes designed to remove dangerous bacteria and parasites such as Cryptosporidium [1]. It is therefore recommended by the World Health Organisation (WHO) that turbidity should not exceed a level of 1 Nephelometric Turbidity Unit (NTU) before chlorination [3].
Turbidity (NTU) levels can change slowly over time due to changes in water catchments as part of an underlying trend, but it can also rapidly peak over shorter periods, sometimes appearing random. Peaks in turbidity are linked to environmental events such as heavy rainfall but can also be a result of operational activities like pumping. The inherent solution features at the site such as fissures within the aquifer can also lead to turbidity events [2].
Peaks in turbidity (NTU) present a significant challenge to the operation of a drinking water company. Turbidity is a naturally occurring a phenomena and somewhat inevitable, however, for a drinking water supplier there are many operational interventions which impact its ability to continue to supply potable water. Depending on the treatment works, there may be varying degrees of treatment activities used to reduce turbidity. Those most resilient to turbidity will likely include a system of filters and settling tanks that remove sediment before chlorination, but even so, these sites with more complex processes will have a limited capacity before treatment must be suspended for cleaning and maintenance. In response to short-term outages, a water supplier may rely on storage reservoirs, alternative treatment works or likely a combination of both. The challenge, however, is that turbidity peaks can occur rapidly and therefore these mitigating activities must be actionable immediately; storage reservoirs will need sufficient supplies, and alternative sources will need to be able to meet the new additional requirement caused by outages. Failure to do so will result in the company unable to meet its demand, or, water entering supply that is not fit for consumption; either instance would be damaging for a water supplier and its customers.
We propose a decision support system that provides drinking water suppliers 7-days notice of a turbidity event, allowing time for remedial actions to be prepared in advance of short term outages caused by turbidity peaking events.
Our first objective is to explore the cause of daily turbidity (NTU) peaking events by identifying candidate predictor variables which we test across each of the six sites. We use this to confirm relevant variables from the literature, but also further explore how operational features such as pumping activity impact on turbidity levels at a treatment works. We apply a static correlation analysis and a dynamic cross-correlation analysis which considers time lags of some variables.
Our second objective is to assess the effectiveness of models from the field of machine learning for the prediction of > 1 NTU events. We use a Generalised Linear Model (GLM) and a non-linear Random Forest (RF) model to predict turbidity across the six treatment works. We use a linear and non-linear model to assess the impact of any non-linearity that may exist in the data; furthermore, they represent different aspects of the trade-off between complexity and predictive capability [4]. We use the AUCROC score to asses the performance of the models, models with a score of greater than 0.70 are considered satisfactory. The causation analysis is complemented using the Variable Importance outputs from the GLM and the RF. We also review the cut-off probability points for event classification.
To address the research problem, we first review how machine learning has been applied to predict a range of other water quality parameters in the literature, and, identify causal factors in turbidity peaking. We then the illustrate the behaviour of turbidity peaking across the six sites and use the static and dynamic correlation analysis to determine candidate variables for the models. We consider the results in three parts: 1) the model performance of the GLM and RF models are reviewed using the AUROC metric, 2) the Variable Importance outputs of the models are examined to understand the multivariate nature of turbidity prediction, complementing the earlier correlation analysis and, 3) we then use a cost-based approach to define the cut-off probability points for event classification for each of the sites. We conclude by reflecting on the general findings for turbidity causation compared to that of the literature and assess the viability of an operation turbidity prediction model as a decision support system.
Section snippets
Background
Techniques from the field of Machine Learning have been applied to solve a wide range of event prediction problems [[5], [6], [7], [8]]. In this section, we seek to understand how statistical and machine learning tools and techniques have been applied to understand and solve a variety of challenges surrounding water quality. Some of the research is focused upon causal analysis while other research attempts to predict and model systems to be tested under different conditions, this, in turn, can
Data description
Turbidity, the dependent variable, is obtained for each of the six sites from the telemetry system of the water company. Turbidity (NTU) level is recorded at least every 15 min using apparatus located at the water treatment works, for this paper, only the daily maximum NTU level is required as it always reveals whether turbidity has exceeded 1 NTU in each day. The record for most sites extends back to at least 1 November 2007 up to the point of extraction on to 15 September 2017. Therefore,
Modelling
For the prediction of >1 NTU turbidity events, we apply a Random Forest (RF) and a Generalised Linear Model (GLM).
Several parameters require tuning in an RF model; the number of variables available for each split, the number of trees to include, and a cost rule by which predictions are made in the training of each tree.
As recommended by Breiman [20], we use the square root of the total number of variables and this figure halved and doubled.
The number of trees used in the model is referred to as
Model performance
We present the AUROC performance of the RF and GLM models at each of the six sites in Table 4. We use a randomly selected 25%/75% test/train split with a 10 k-fold cross-validation and 10 repeats. For the cross-validation results, we present the mean AUROC across the 100 samples and the standard deviation.
At five of six sites (Site-A, Site-B, Site-C, Site-D, Site-E), we obtained an AUROC score of over 0.80 in the holdout sample suggesting that these models have a ‘good’ discriminative ability.
Implementation of the decision support system
So far we have considered the performance of the models across the six sites in terms of AUC performance. We now consider at which probability, from 0.00 to 1.00, that the decision support system positively classifies an event and the water company takes mitigating steps. Mitigating actions might include the decision ensure that the reservoirs are full before an event, or, committing personnel to test the equipment at an alternative site to while pumping at the site in question is temporarily
Causation
The first aim of this paper was to identify the potential variables causing daily turbidity (NTU) peaking events at groundwater sources for a water company operating in the South Coast of England. Several approaches have been used to understand the cause of turbidity (NTU) at six water sources on the South Coast of England; a static correlation analysis, a dynamic correlation analysis and an assessment of variable importance. We sought to confirm the findings of the hydrological literature
Acknowledgments
We would like to acknowledge the anonymous company who provided the data.
This work was supported by the Economic and Social Research Council [grant number ES/P000673/1]; and The Alan Turing Institute under the EPSRC [grant number EP/N510129/1].
Matthew Stevenson is a PhD student in the Southampton Business School at the University of Southampton. He completed an MSc in Business Analytics & Management Science at the University of Southampton in 2017. His main research interests are predictive analytics, deep learning and natural language processing.
References (27)
- et al.
Development and application of consumer credit scoring models using profit-based classification measures
European Journal of Operational Research
(2014) - et al.
A decision support system for predictive police patrolling
Decision Support Systems
(2015) - et al.
Early detection of network element outages based on customer trouble calls
Decision Support Systems
(2015) - et al.
A data mining based system for credit-card fraud detection in e-tail
Decision Support Systems
(2017) - et al.
A novel approach for automated credit card transaction fraud detection using network-based extensions
Decision Support Systems
(2015) - et al.
Modeling The relationship between land use and surface water quality
Journal of Environmental Management
(2002) - et al.
Investigating transport properties and turbidity dynamics of a karst aquifer using correlation, spectral, and wavelet analyses
Journal of Hydrology
(2006) - et al.
Water Supply
(1994) Water Quality and Health-Review of Turbidity: Information for Regulators and Water Suppliers
(2017)Guidance on the Implementation of the Water Supply (Water Quality) Regulations 2000 (As Amended) in England
(March 2012)
Occurrence of Giardia and Cryptosporidium spp. in surface water supplies
Applied and Environmental Microbiology
The importance of lake-specific characteristics for water quality across the continental United States
Ecological Applications
Neural network and genetic programming for modelling coastal algal blooms
International Journal of Environment and Pollution
Cited by (30)
A soft-sensor for sustainable operation of coagulation and flocculation units
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :Abba et al. (2019) evaluated the performance of some multi-parametric ANN methods for water turbidity. Likewise, Stevenson and Bravo (2019) evaluated the performance of two machine learning systems, including Generalized Linear Model (GLM) and RF, for forecasting the Nephelometric Turbidity Unit (NTU) through water supply planning. Zounemat-Kermani et al. (2020) utilized the multilayer perceptron-based ANN (MLPANN), Classification And Regression Tree (CART), Group Method of Data Handling (GMDH), and the Response Surface Methodology (RSM) to estimate the water turbidity.
Risk assessment of Cryptosporidium intake in drinking water treatment plant by a combination of predictive models and event-tree and fault-tree techniques
2022, Science of the Total EnvironmentCitation Excerpt :Despite these good results, in 0.20 % of the cases water was rejected due to being over 100 NTU. As Stevenson and Bravo (2019) have indicated, turbidity levels can change slowly over time due to changes in water catchments as part of an underlying trend, but can also rapidly peak over shorter periods, even though they appear to be random. Turbidity peaks are linked to environmental events such as heavy rainfall but can also be a result of operations like pumping.
Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data
2022, Science of the Total EnvironmentCitation Excerpt :For example, artificial neural networks have been successfully applied to predict microbial water quality in terms of compliance with recreational water quality regulations (Avila et al., 2018; Choi and Bae, 2018; Laureano-Rosario et al., 2019; Vijayashanthar et al., 2018), often alongside various regression methods (Mas and Ahlfeld, 2007; Motamarri and Boccelli, 2012; Thoe et al., 2012, 2014, 2015), as well as classification trees (Avila et al., 2018; Stidson et al., 2012). In the context of drinking water supply, there are also some recent attempts to predict the concentrations of microorganisms using, e.g., zero-inflated regression models, random forest regression model, adaptive neuro-fuzzy inference system, and Gaussian process for machine learning (Mohammed et al., 2017a, 2017b, 2017c, 2018), as well as of other pollutants (Asheri Arnon et al., 2019; Samanipour et al., 2019; Speight et al., 2019; Stevenson and Bravo, 2019). A recent review (Francy et al., 2020) provides several examples when data-driven methods are used in operational nowcasting systems for public notification and water management.
Monitoring the vertical distribution of HABs using hyperspectral imagery and deep learning models
2021, Science of the Total EnvironmentA new approach to monitor water quality in the Menor sea (Spain) using satellite data and machine learning methods
2021, Environmental Pollution
Matthew Stevenson is a PhD student in the Southampton Business School at the University of Southampton. He completed an MSc in Business Analytics & Management Science at the University of Southampton in 2017. His main research interests are predictive analytics, deep learning and natural language processing.
Dr Cristián Bravo is Associate Professor of Business Analytics. His research focuses on the development and application of data science methodologies in the context of credit risk analytics, covering areas such as deep learning, text analytics, image processing, and social network analysis. Dr Bravo has an extensive publication list in journals and international conferences, covering the multiple topics in data science and analytics. Dr Bravo is also editorial board member of the journal Applied Soft Computing, the official journal of the World Federation on Soft Computing, and of the Journal of Business Analytics published by the UK's Operational Research Society.