Modelling SO2 concentration at a point with statistical approaches
Introduction
The level of sulphur emissions, mainly as SO2, have over the last 20 years been continuously decreasing in most western industrialised countries (see data reported by Holland et al. (1999) compared with historical data reported by Gschwandtner et al. (1986) and also by Lins, 1987, Zannetti, 1990). However, localised SO2 problems still exist related to local emission, meteorological and topographical factors. By contrast, sulphur emissions are increasing in the emerging industrialised countries of Eastern Europe as well as in other developing countries around the world. Hence, environmental problems associated with sulphur emissions are still far from being fully solved.
Modelling SO2 air pollution is a complex task which has drawn the attention of many scientists all over the world since the early 1960s. Unfortunately, the literature on this subject shows that a universal technique for modelling SO2 time series recorded at specific points in a given area does not exist. To address this issue, which is relevant to the problem of controlling the levels of SO2 pollution, several modelling approaches have been proposed, of deterministic and statistical type. The literature shows (Gilbert, 1987, Zannetti, 1990 that statistical approaches are frequently considered for short-term forecasting applied to real-time control of emissions or to air quality assessment. These methods have some advantages over deterministic approaches. Firstly, they do not need data about emissions (which are sometimes unavailable, especially not in real-time) since they are based on the use of air quality and meteorological measurements only (which, in turn, are largely available from air quality and meteorological monitoring networks). Secondly, the structure of statistical models is often simpler than deterministic models and they can more easily be implemented and used by non-experts. However, the statistical models are not portable from site to site since they are developed and calibrated on local data.
Several statistical approaches have been proposed in the literature (Gardner and Dorling, 1998, Finzi et al., 1998, Nunnari et al., 1998, Nunnari et al., 2001). For modelling SO2 concentrations, artificial intelligence (AI) based techniques, namely artificial neural network based models and neuro-fuzzy models, seem to be the most promising. However, a systematic inter-comparison experiment utilising these approaches with other emerging techniques such as wavelet based approaches, generalised linear models, local prediction in phase-space and generalised additive models, has never been carried out.
The present paper describes the results obtained by inter-comparing several statistical techniques for modelling SO2 concentration at a point, such as neural networks, fuzzy logic, generalised additive techniques and other recently proposed statistical approaches. The results of the inter-comparison are the fruits of a collaboration between the partners involved in the APPETISE project funded under the EC–Information Societies and Technologies (IST) Framework V programme. One of the main aims of this work is, based on the results of the modelling inter-comparison, to give guidelines for designing a warning system for air quality assessment. The results of a similar inter-comparison for surface ozone were presented by Schlink et al. (2003) while for nitrogen oxides and particulate modelling inter-comparison is reported by Kukkonen et al., 2002, Partanen et al., in press.
Section snippets
Test areas and targets
Two different cases for study were selected: the Siracusa industrial area, in Italy, where the pollution is largely due to industrial emissions and the Belfast urban area, in the UK, where pollution is mainly due to domestic heating.
The Siracusa industrial area is situated in the south-east of Sicily (Italy). In the post-war period, one of the largest concentrations of petrochemical industries in Europe developed here and it is considered to be an area of high environmental risk. The air
Inter-compared techniques
The following techniques were selected for the inter-comparison exercise:
- –
ANN—artificial neural networks with backpropagation training algorithm;
- –
MNN—artificial neural networks with maximum likelihood cost function and conjugate gradient training algorithm;
- –
WAG—wavelet functions with genetic algorithms;
- –
NFU—neuro-fuzzy techniques;
- –
GAM—generalised additive models;
- –
LPH—local prediction in phase-space;
- –
LIN—linear time-series model;
- –
PER—persistence model.
A detailed description of these techniques is beyond
Performance indices
In order to objectively inter-compare the considered statistical approaches several performance indices were taken into account. We have grouped these indices into two separate sets: (1) global fit indices, i.e. those indices that give measures of the fit of the overall time series (i.e. for instance the RMSE error), and (2) those that give a measure of the capability of a given model to predict critical episodes (i.e. for instance the SP index), referred to here as exceedence indices. A list
Structure of prediction models
It is necessary to stress here that the problem of finding the most appropriate structure for a statistical air quality prediction model (i.e. the exogenous inputs) is perhaps one of the major problem for the modellers. First of all the candidate variables are often numerous and not necessarily known a priori. Moreover, the link between the pollutant concentration and the exogenous inputs is non-linear and it depends on the geographical location of the measurement point. Further, the selected
Missing data interpolation
In order to provide a common set of data during the inter-comparison exercise, the problem of missing data was first addressed leading to the implementation of appropriate procedures for missing data interpolation. This pre-processing phase was carried out year by year on the whole dataset (including meteorological and pollution data). It is also necessary to observe that some of the modelling techniques considered (for example, LPH) require imputed data, as they are not able to handle missing
Results and discussion
For the sake of brevity, results will be extensively reported here only for the DMEA target that was the only one characterised by an appreciable accuracy. However, numerical results and consideration will be given for the DMAX and HMAX targets. Results that refer to DMEA models of MF type are reported in Table 7a and b for Melilli and Belfast, respectively, while performances of NMF models are shown in Table 7c and d. As it was expected, results show that MF models perform better than NMF
Conclusions
In this paper, some of the most promising statistical techniques for the prediction of SO2 concentration at a point were compared. The results show that there is no single modelling approach, which generates optimum results in terms of the full range of performance indices considered. However, assuming that in view of the implementation of a warning system for air quality control, approaches that are able to work better in the prediction of critical episodes must be preferred, the artificial
Acknowledgements
The support of the European Commission’s Framework V IST Programme (contract no. IST-1999-11764) is gratefully acknowledged. We would like to thank the Province of Siracusa, the UK Air Quality Archive and the British Atmospheric Data Centre for providing all the datasets considered in this work.
References (64)
- et al.
A neural network-based method for short-term predictions of ambient SO2 concentrations in highly polluted industrial areas of complex terrain
Atmospheric Environment
(1993) - et al.
A model for predicting maximum and 8 h average ozone in Houston
Atmospheric Environment
(1999) - et al.
Modelling the effects of meteorology on ozone in Houston using cluster analysis and generalised models
Atmospheric Environment
(1998) - et al.
Maximum likelihood cost functions for neural networks models of air quality data
Atmospheric Environment
(2003) - et al.
Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences
Atmospheric Environment
(1998) Trend analysis of monthly sulfur dioxide emissions in the conterminous United States
Atmospheric Environment
(1987)- et al.
The application of neural techniques to the modelling of time series of atmospheric pollution data
Ecological Modelling
(1998) - et al.
A rigorous inter-comparison of ground-level ozone predictions
Atmospheric Environment
(2003) Analysis of Observed Chaotic Data
(1996)- et al.
A neural architecture to predict pollution in industrial areas
Neural net approximation
Universal approximation bounds for superposition of a sigmoidal function
IEEE Transactions on Information Theory
Wavelets in identification
Proceedings of the SYSIS’94
Time Series Analysis, Forecasting and Control
Time Series Analysis, Forecasting and Control
Estimating optimal transformations for multiple regression and correlation
American Statistical Association
Robust locally weighted regression and smoothing scatterplots
Journal of the American Statistical Association
Approximation by super precision of a sigmoidal function
Using rule sets to maximize ROC performance
Real-time ozone episode forecast: a comparison between neural network and grey-box models
Practical Methods of Optimization
Gauss–Newton approximation to Bayesian regularization
Error functions for predicting episodes of poor air quality
Statistical Methods for Environmental Pollution Monitoring
Genetic Algorithm in Search, Optimization and Machine Learning
Characterisation of strange attractors
Physical Review Letters
Historic emissions of sulfur and nitrogen oxides in the United States from 1900 to 1980
JAPCA
Generalized Additive Models
Generalized additive models: some applications
Journal of the American Statistical Association
Fuzzy-Neural Networks, Soft Computing Series
Cited by (97)
Analysis of surface ozone episodes using WRF-HYSPLIT model at Biga Peninsula in the Marmara region of Turkey
2020, Atmospheric Pollution ResearchReview of flue gas acid dew-point and related low temperature corrosion
2020, Journal of the Energy InstituteCitation Excerpt :The difficulty in the task to determine ADP and works relating to LTC have been discussed. The issue of ADP originates from the fact that the flue gas contains acid gases such as SO2 [8], SO3[9] [10] [11], HF [12,13], HBr [12,13], NO, NO2 [14], etc. These gases can transform to sulfuric acid (H2SO4), hydrochloric acid, nitric acid, and further condense under low temperature conditions [15–17].
Time series analysis with explanatory variables: A systematic literature review
2018, Environmental Modelling and SoftwareArtificial neural network based modeling to evaluate methane yield from biogas in a laboratory-scale anaerobic bioreactor
2016, Bioresource TechnologyCitation Excerpt :Neural networks are particularly suited to model complex non-linear processes. ANNs had become a popular tool since the last decade for modeling environmental systems, such as air pollution (Abdul-Wahab and Al-Alawi, 2002; Nunnari et al., 2004; Karaca et al., 2005) and prediction of performance of wastewater treatment plant (Hamed et al., 2004). Nevertheless, literatures on the predictive capabilities of neural networks on biogas generation rate from MSW are also available.
Application of air quality combination forecasting to Bogota
2014, Atmospheric Environment
- 1
JANN (Java Artificial Neural Network) Tool for air pollution modelling by using multi-layer perceptron neural networks.