Modelling SO2 concentration at a point with statistical approaches

doi:10.1016/j.envsoft.2003.10.003

Environmental Modelling & Software

Volume 19, Issue 10, October 2004, Pages 887-905

https://doi.org/10.1016/j.envsoft.2003.10.003 Get rights and content

Abstract

In this paper, the results obtained by inter-comparing several statistical techniques for modelling SO₂ concentration at a point such as neural networks, fuzzy logic, generalised additive techniques and other recently proposed statistical approaches are reported. The results of the inter-comparison are the fruits of collaboration between some of the partners of the APPETISE project funded under the Framework V Information Societies and Technologies (IST) programme. Two different cases for study were selected: the Siracusa industrial area, in Italy, where the pollution is dominated by industrial emissions and the Belfast urban area, in the UK, where domestic heating makes an important contribution. The different kinds of pollution (industrial/urban) and different locations of the areas considered make the results more general and interesting. In order to make the inter-comparison more objective, all the modellers considered the same datasets. Missing data in the original time series was filled by using appropriate techniques. The inter-comparison work was carried out on a rigorous basis according to the performance indices recommended by the European Topic Centre on Air and Climate Change (ETC/ACC). The targets for the implemented prediction models were defined according to the EC normative relating to limit values for sulphur dioxide. According to this normative, three different kinds of targets were considered namely daily mean values, daily maximum values and hourly mean values. The inter-compared models were tested on real cases of poor air quality. In the paper, the inter-compared techniques are ranked in terms of their capability to predict critical episodes. A ranking in terms of their predictability of the three different targets considered is also proposed. Several key issues are illustrated and discussed such as the role of input variable selection, the use of meteorological data, and the use of interpolated time series. Moreover, a novel approach referred to as the technique of balancing the training pattern set, which was successfully applied to improve the capability of ANN models to predict exceedences is introduced. The results show that there is no single modelling approach, which generates optimum results in terms of the full range of performance indices considered. In view of the implementation of a warning system for air quality control, approaches that are able to work better in the prediction of critical episodes must be preferred. Therefore, the artificial neural network prediction models can be recommended for this purpose. The best forecasts were achieved for daily averages of SO₂ while daily maximum and hourly mean values are difficult to predict with acceptable accuracy.

Introduction

The level of sulphur emissions, mainly as SO₂, have over the last 20 years been continuously decreasing in most western industrialised countries (see data reported by Holland et al. (1999) compared with historical data reported by Gschwandtner et al. (1986) and also by Lins, 1987, Zannetti, 1990). However, localised SO₂ problems still exist related to local emission, meteorological and topographical factors. By contrast, sulphur emissions are increasing in the emerging industrialised countries of Eastern Europe as well as in other developing countries around the world. Hence, environmental problems associated with sulphur emissions are still far from being fully solved.

Modelling SO₂ air pollution is a complex task which has drawn the attention of many scientists all over the world since the early 1960s. Unfortunately, the literature on this subject shows that a universal technique for modelling SO₂ time series recorded at specific points in a given area does not exist. To address this issue, which is relevant to the problem of controlling the levels of SO₂ pollution, several modelling approaches have been proposed, of deterministic and statistical type. The literature shows (Gilbert, 1987, Zannetti, 1990 that statistical approaches are frequently considered for short-term forecasting applied to real-time control of emissions or to air quality assessment. These methods have some advantages over deterministic approaches. Firstly, they do not need data about emissions (which are sometimes unavailable, especially not in real-time) since they are based on the use of air quality and meteorological measurements only (which, in turn, are largely available from air quality and meteorological monitoring networks). Secondly, the structure of statistical models is often simpler than deterministic models and they can more easily be implemented and used by non-experts. However, the statistical models are not portable from site to site since they are developed and calibrated on local data.

Several statistical approaches have been proposed in the literature (Gardner and Dorling, 1998, Finzi et al., 1998, Nunnari et al., 1998, Nunnari et al., 2001). For modelling SO₂ concentrations, artificial intelligence (AI) based techniques, namely artificial neural network based models and neuro-fuzzy models, seem to be the most promising. However, a systematic inter-comparison experiment utilising these approaches with other emerging techniques such as wavelet based approaches, generalised linear models, local prediction in phase-space and generalised additive models, has never been carried out.

The present paper describes the results obtained by inter-comparing several statistical techniques for modelling SO₂ concentration at a point, such as neural networks, fuzzy logic, generalised additive techniques and other recently proposed statistical approaches. The results of the inter-comparison are the fruits of a collaboration between the partners involved in the APPETISE project funded under the EC–Information Societies and Technologies (IST) Framework V programme. One of the main aims of this work is, based on the results of the modelling inter-comparison, to give guidelines for designing a warning system for air quality assessment. The results of a similar inter-comparison for surface ozone were presented by Schlink et al. (2003) while for nitrogen oxides and particulate modelling inter-comparison is reported by Kukkonen et al., 2002, Partanen et al., in press.

Section snippets

Test areas and targets

Two different cases for study were selected: the Siracusa industrial area, in Italy, where the pollution is largely due to industrial emissions and the Belfast urban area, in the UK, where pollution is mainly due to domestic heating.

The Siracusa industrial area is situated in the south-east of Sicily (Italy). In the post-war period, one of the largest concentrations of petrochemical industries in Europe developed here and it is considered to be an area of high environmental risk. The air

Inter-compared techniques

The following techniques were selected for the inter-comparison exercise:

–
ANN—artificial neural networks with backpropagation training algorithm;
–
MNN—artificial neural networks with maximum likelihood cost function and conjugate gradient training algorithm;
–
WAG—wavelet functions with genetic algorithms;
–
NFU—neuro-fuzzy techniques;
–
GAM—generalised additive models;
–
LPH—local prediction in phase-space;
–
LIN—linear time-series model;
–
PER—persistence model.

A detailed description of these techniques is beyond

Performance indices

In order to objectively inter-compare the considered statistical approaches several performance indices were taken into account. We have grouped these indices into two separate sets: (1) global fit indices, i.e. those indices that give measures of the fit of the overall time series (i.e. for instance the RMSE error), and (2) those that give a measure of the capability of a given model to predict critical episodes (i.e. for instance the SP index), referred to here as exceedence indices. A list

Structure of prediction models

It is necessary to stress here that the problem of finding the most appropriate structure for a statistical air quality prediction model (i.e. the exogenous inputs) is perhaps one of the major problem for the modellers. First of all the candidate variables are often numerous and not necessarily known a priori. Moreover, the link between the pollutant concentration and the exogenous inputs is non-linear and it depends on the geographical location of the measurement point. Further, the selected

Missing data interpolation

In order to provide a common set of data during the inter-comparison exercise, the problem of missing data was first addressed leading to the implementation of appropriate procedures for missing data interpolation. This pre-processing phase was carried out year by year on the whole dataset (including meteorological and pollution data). It is also necessary to observe that some of the modelling techniques considered (for example, LPH) require imputed data, as they are not able to handle missing

Results and discussion

For the sake of brevity, results will be extensively reported here only for the DMEA target that was the only one characterised by an appreciable accuracy. However, numerical results and consideration will be given for the DMAX and HMAX targets. Results that refer to DMEA models of MF type are reported in Table 7a and b for Melilli and Belfast, respectively, while performances of NMF models are shown in Table 7c and d. As it was expected, results show that MF models perform better than NMF

Conclusions

In this paper, some of the most promising statistical techniques for the prediction of SO₂ concentration at a point were compared. The results show that there is no single modelling approach, which generates optimum results in terms of the full range of performance indices considered. However, assuming that in view of the implementation of a warning system for air quality control, approaches that are able to work better in the prediction of critical episodes must be preferred, the artificial

Acknowledgements

The support of the European Commission’s Framework V IST Programme (contract no. IST-1999-11764) is gratefully acknowledged. We would like to thank the Province of Siracusa, the UK Air Quality Archive and the British Atmospheric Data Centre for providing all the datasets considered in this work.

References (64)

M. Boznar et al.
A neural network-based method for short-term predictions of ambient SO₂ concentrations in highly polluted industrial areas of complex terrain
Atmospheric Environment
(1993)
J.M. Davis et al.
A model for predicting maximum and 8 h average ozone in Houston
Atmospheric Environment
(1999)
J.M. Davis et al.
Modelling the effects of meteorology on ozone in Houston using cluster analysis and generalised models
Atmospheric Environment
(1998)
S.R. Dorling et al.
Maximum likelihood cost functions for neural networks models of air quality data
Atmospheric Environment
(2003)
M.W. Gardner et al.
Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences
Atmospheric Environment
(1998)
H.F. Lins
Trend analysis of monthly sulfur dioxide emissions in the conterminous United States
Atmospheric Environment
(1987)
G. Nunnari et al.
The application of neural techniques to the modelling of time series of atmospheric pollution data
Ecological Modelling
(1998)
U. Schlink et al.
A rigorous inter-comparison of ground-level ozone predictions
Atmospheric Environment
(2003)
H.D.I. Abarbanel
Analysis of Observed Chaotic Data
(1996)
P. Arena et al.
A neural architecture to predict pollution in industrial areas

A.R. Barron

Neural net approximation

A.R. Barron

Universal approximation bounds for superposition of a sigmoidal function

IEEE Transactions on Information Theory

(1993)

A. Benveniste et al.

Wavelets in identification

Proceedings of the SYSIS’94

(1994)

G.E.P. Box et al.

Time Series Analysis, Forecasting and Control

(1976)

G.E.P. Box et al.

Time Series Analysis, Forecasting and Control

(1994)

L. Breiman et al.

Estimating optimal transformations for multiple regression and correlation

American Statistical Association

(1985)

W.S. Cleveland

Robust locally weighted regression and smoothing scatterplots

Journal of the American Statistical Association

(1979)

G. Cybenko

Approximation by super precision of a sigmoidal function

(1989)

Eklund, P., Klawonn, F., 1992. Neural fuzzy logic programming. IEEE Transactions Neural Network, vol. 3 (5), pp....

T. Fawcett

Using rule sets to maximize ROC performance

G. Finzi et al.

Real-time ozone episode forecast: a comparison between neural network and grey-box models

R. Fletcher

Practical Methods of Optimization

(1987)

F.D. Foresee et al.

Gauss–Newton approximation to Bayesian regularization

R.J. Foxall et al.

Error functions for predicting episodes of poor air quality

R.O. Gilbert

Statistical Methods for Environmental Pollution Monitoring

(1987)

D.E. Goldberg

Genetic Algorithm in Search, Optimization and Machine Learning

(1989)

P. Grassberger et al.

Characterisation of strange attractors

Physical Review Letters

(1983)

G. Gschwandtner et al.

Historic emissions of sulfur and nitrogen oxides in the United States from 1900 to 1980

JAPCA

(1986)

Gupta, M.M., Rao, D.H., 1994. On the principles of fuzzy neural network. Fuzzy Sets and System, vol. 61 (1), pp....

T.J. Hastie et al.

Generalized Additive Models

(1986)

T.J. Hastie et al.

Generalized additive models: some applications

Journal of the American Statistical Association

(1987)

I. Hayashi et al.

Fuzzy-Neural Networks, Soft Computing Series

(1997)

Cited by (97)

Analysis of surface ozone episodes using WRF-HYSPLIT model at Biga Peninsula in the Marmara region of Turkey
2020, Atmospheric Pollution Research
Ozone episodes were observed frequently in Biga Peninsula of Marmara region in Turkey which is suburban site and has forests and agricultural areas. This study aims to understand the role of atmospheric conditions that lead to the ozone episodes over the Biga Peninsula. Ozone concentrations were measured at the monitoring stations for three years (2013–2015) to identify and characterize the ozone episodes in the study area. HYSPLITv4 model driven by the WRF ARW v3.8 and surface data is used to identify the emission source locations. The WRF ARW results were analyzed for the meteorological variables used in the study (e.g. air temperature, relative humidity, u and v wind components) with index of agreement, correlation coefficient, mean bias error, and root mean squared error. It is found that local photochemical production and accumulation and transport of pollutants from the anthropogenic sources (residential, traffic and industrial) in Marmara regions are the most important factors on ozone levels. The five ozone episodes were also analyzed using HYSPLIT model to infer long range transport. Three-day backward air mass trajectories analysis is performed to assess the contribution of long-range transport of pollutants, resulting in the following main routes: Istanbul and Black Sea. The results show that rural areas have higher cumulative exposure to ozone than suburban locations.
Review of flue gas acid dew-point and related low temperature corrosion
2020, Journal of the Energy Institute
Citation Excerpt :
The difficulty in the task to determine ADP and works relating to LTC have been discussed. The issue of ADP originates from the fact that the flue gas contains acid gases such as SO2 [8], SO3[9] [10] [11], HF [12,13], HBr [12,13], NO, NO2 [14], etc. These gases can transform to sulfuric acid (H2SO4), hydrochloric acid, nitric acid, and further condense under low temperature conditions [15–17].
Acid dew-point (ADP) is the temperature at which the acid vapor, normally means sulfuric acid (H₂SO₄), in flue gas begins to condense. Acid condensation can result in low-temperature corrosion (LTC), which can threaten the safety of boilers. An exhaust temperature higher than ADP can relieve LTC, however, result in a low thermal efficiency. Therefore, ADP is regarded as a key parameter to guide safety and efficiency of boilers. Previous investigations on ADP are abundant, but a comprehensive review is absent. This work tried to fill this gap. First, the existing methods for ADP determination, including empirical models, semi-empirical models, and measurement methods, were explained. The empirical models were specially classified according to the transformation path of SO₃/H₂SO₄ in flue gas. Then, these methods were compared and evaluated, and the difficulty in identifying the beginning of the acid condensation was considered to result in the different results among varied methods. A new method which can reveal the acid condensing process was recommended for evaluating and discussing the issue of ADP. Moreover, this paper also involved the topic on preventing LTC below ADP focused by the currently popular deep waste heat recovery field. Finally, future research directions were suggested.
Time series analysis with explanatory variables: A systematic literature review
2018, Environmental Modelling and Software
Time series analysis with explanatory variables encompasses methods to model and predict correlated data taking into account additional information, known as exogenous variables. A thorough search in literature returned a dearth of systematic literature reviews (SLR) on time series models with explanatory variables. The main objective is to fill this gap by applying a rigorous and reproducible SLR and a bibliometric analysis to study the evolution of this area over time. The study resulted in the identification of the main methods of time series that incorporate input variables per knowledge area and methodology. The largest number of papers belongs to environmental sciences, followed by economics and health. Regression model is the method with the highest number of applications, followed by Artificial Neural Networks and Support Vector Machines, which experienced rapid and recent growth. A research agenda in time series analysis with exogenous variables closes the paper.
Artificial neural network based modeling to evaluate methane yield from biogas in a laboratory-scale anaerobic bioreactor
2016, Bioresource Technology
Citation Excerpt :
Neural networks are particularly suited to model complex non-linear processes. ANNs had become a popular tool since the last decade for modeling environmental systems, such as air pollution (Abdul-Wahab and Al-Alawi, 2002; Nunnari et al., 2004; Karaca et al., 2005) and prediction of performance of wastewater treatment plant (Hamed et al., 2004). Nevertheless, literatures on the predictive capabilities of neural networks on biogas generation rate from MSW are also available.
The performance of a laboratory-scale anaerobic bioreactor was investigated in the present study to determine methane (CH₄) content in biogas yield from digestion of organic fraction of municipal solid waste (OFMSW). OFMSW consists of food waste, vegetable waste and yard trimming. An organic loading between 40 and 120 kg VS/m³ was applied in different runs of the bioreactor. The study was aimed to focus on the effects of various factors, such as pH, moisture content (MC), total volatile solids (TVS), volatile fatty acids (VFAs), and CH₄ fraction on biogas production. OFMSW witnessed high CH₄ yield as 346.65 L CH₄/kg VS added. A target of 60–70% of CH₄ fraction in biogas was set as an optimized condition. The experimental results were statistically optimized by application of ANN model using free forward back propagation in MATLAB environment.
Application of air quality combination forecasting to Bogota
2014, Atmospheric Environment
The bulk of existing work on the statistical forecasting of air quality is based on either neural networks or linear regressions, which are both subject to important drawbacks. In particular, while neural networks are complicated and prone to in-sample overfitting, linear regressions are highly dependent on the specification of the regression function. The present paper shows how combining linear regression forecasts can be used to circumvent all of these problems. The usefulness of the proposed combination approach is verified using both Monte Carlo simulation and an extensive application to air quality in Bogota, one of the largest and most polluted cities in Latin America.
A Prediction Model for Air Pollution using Artificial Neural Network and Multiple Linear Regression
2024, Research Square

View all citing articles on Scopus

¹: JANN (Java Artificial Neural Network) Tool for air pollution modelling by using multi-layer perceptron neural networks.

View full text

Modelling SO2 concentration at a point with statistical approaches

Abstract

Introduction

Section snippets

Test areas and targets

Inter-compared techniques

Performance indices

Structure of prediction models

Missing data interpolation

Results and discussion

Conclusions

Acknowledgements

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Ecological Modelling

Atmospheric Environment

Analysis of Observed Chaotic Data

A neural architecture to predict pollution in industrial areas

Neural net approximation

Universal approximation bounds for superposition of a sigmoidal function

IEEE Transactions on Information Theory

Wavelets in identification

Proceedings of the SYSIS’94

Time Series Analysis, Forecasting and Control

Time Series Analysis, Forecasting and Control

Estimating optimal transformations for multiple regression and correlation

American Statistical Association

Robust locally weighted regression and smoothing scatterplots

Journal of the American Statistical Association

Approximation by super precision of a sigmoidal function

Using rule sets to maximize ROC performance

Real-time ozone episode forecast: a comparison between neural network and grey-box models

Practical Methods of Optimization

Gauss–Newton approximation to Bayesian regularization

Error functions for predicting episodes of poor air quality

Statistical Methods for Environmental Pollution Monitoring

Genetic Algorithm in Search, Optimization and Machine Learning

Characterisation of strange attractors

Physical Review Letters

Historic emissions of sulfur and nitrogen oxides in the United States from 1900 to 1980

JAPCA

Generalized Additive Models

Generalized additive models: some applications

Journal of the American Statistical Association

Fuzzy-Neural Networks, Soft Computing Series

Modelling SO₂ concentration at a point with statistical approaches