Modelling SO2 concentration at a point with statistical approaches

https://doi.org/10.1016/j.envsoft.2003.10.003Get rights and content

Abstract

In this paper, the results obtained by inter-comparing several statistical techniques for modelling SO2 concentration at a point such as neural networks, fuzzy logic, generalised additive techniques and other recently proposed statistical approaches are reported. The results of the inter-comparison are the fruits of collaboration between some of the partners of the APPETISE project funded under the Framework V Information Societies and Technologies (IST) programme. Two different cases for study were selected: the Siracusa industrial area, in Italy, where the pollution is dominated by industrial emissions and the Belfast urban area, in the UK, where domestic heating makes an important contribution. The different kinds of pollution (industrial/urban) and different locations of the areas considered make the results more general and interesting. In order to make the inter-comparison more objective, all the modellers considered the same datasets. Missing data in the original time series was filled by using appropriate techniques. The inter-comparison work was carried out on a rigorous basis according to the performance indices recommended by the European Topic Centre on Air and Climate Change (ETC/ACC). The targets for the implemented prediction models were defined according to the EC normative relating to limit values for sulphur dioxide. According to this normative, three different kinds of targets were considered namely daily mean values, daily maximum values and hourly mean values. The inter-compared models were tested on real cases of poor air quality. In the paper, the inter-compared techniques are ranked in terms of their capability to predict critical episodes. A ranking in terms of their predictability of the three different targets considered is also proposed. Several key issues are illustrated and discussed such as the role of input variable selection, the use of meteorological data, and the use of interpolated time series. Moreover, a novel approach referred to as the technique of balancing the training pattern set, which was successfully applied to improve the capability of ANN models to predict exceedences is introduced. The results show that there is no single modelling approach, which generates optimum results in terms of the full range of performance indices considered. In view of the implementation of a warning system for air quality control, approaches that are able to work better in the prediction of critical episodes must be preferred. Therefore, the artificial neural network prediction models can be recommended for this purpose. The best forecasts were achieved for daily averages of SO2 while daily maximum and hourly mean values are difficult to predict with acceptable accuracy.

Introduction

The level of sulphur emissions, mainly as SO2, have over the last 20 years been continuously decreasing in most western industrialised countries (see data reported by Holland et al. (1999) compared with historical data reported by Gschwandtner et al. (1986) and also by Lins, 1987, Zannetti, 1990). However, localised SO2 problems still exist related to local emission, meteorological and topographical factors. By contrast, sulphur emissions are increasing in the emerging industrialised countries of Eastern Europe as well as in other developing countries around the world. Hence, environmental problems associated with sulphur emissions are still far from being fully solved.

Modelling SO2 air pollution is a complex task which has drawn the attention of many scientists all over the world since the early 1960s. Unfortunately, the literature on this subject shows that a universal technique for modelling SO2 time series recorded at specific points in a given area does not exist. To address this issue, which is relevant to the problem of controlling the levels of SO2 pollution, several modelling approaches have been proposed, of deterministic and statistical type. The literature shows (Gilbert, 1987, Zannetti, 1990 that statistical approaches are frequently considered for short-term forecasting applied to real-time control of emissions or to air quality assessment. These methods have some advantages over deterministic approaches. Firstly, they do not need data about emissions (which are sometimes unavailable, especially not in real-time) since they are based on the use of air quality and meteorological measurements only (which, in turn, are largely available from air quality and meteorological monitoring networks). Secondly, the structure of statistical models is often simpler than deterministic models and they can more easily be implemented and used by non-experts. However, the statistical models are not portable from site to site since they are developed and calibrated on local data.

Several statistical approaches have been proposed in the literature (Gardner and Dorling, 1998, Finzi et al., 1998, Nunnari et al., 1998, Nunnari et al., 2001). For modelling SO2 concentrations, artificial intelligence (AI) based techniques, namely artificial neural network based models and neuro-fuzzy models, seem to be the most promising. However, a systematic inter-comparison experiment utilising these approaches with other emerging techniques such as wavelet based approaches, generalised linear models, local prediction in phase-space and generalised additive models, has never been carried out.

The present paper describes the results obtained by inter-comparing several statistical techniques for modelling SO2 concentration at a point, such as neural networks, fuzzy logic, generalised additive techniques and other recently proposed statistical approaches. The results of the inter-comparison are the fruits of a collaboration between the partners involved in the APPETISE project funded under the EC–Information Societies and Technologies (IST) Framework V programme. One of the main aims of this work is, based on the results of the modelling inter-comparison, to give guidelines for designing a warning system for air quality assessment. The results of a similar inter-comparison for surface ozone were presented by Schlink et al. (2003) while for nitrogen oxides and particulate modelling inter-comparison is reported by Kukkonen et al., 2002, Partanen et al., in press.

Section snippets

Test areas and targets

Two different cases for study were selected: the Siracusa industrial area, in Italy, where the pollution is largely due to industrial emissions and the Belfast urban area, in the UK, where pollution is mainly due to domestic heating.

The Siracusa industrial area is situated in the south-east of Sicily (Italy). In the post-war period, one of the largest concentrations of petrochemical industries in Europe developed here and it is considered to be an area of high environmental risk. The air

Inter-compared techniques

The following techniques were selected for the inter-comparison exercise:

  • ANN—artificial neural networks with backpropagation training algorithm;

  • MNN—artificial neural networks with maximum likelihood cost function and conjugate gradient training algorithm;

  • WAG—wavelet functions with genetic algorithms;

  • NFU—neuro-fuzzy techniques;

  • GAM—generalised additive models;

  • LPH—local prediction in phase-space;

  • LIN—linear time-series model;

  • PER—persistence model.

A detailed description of these techniques is beyond

Performance indices

In order to objectively inter-compare the considered statistical approaches several performance indices were taken into account. We have grouped these indices into two separate sets: (1) global fit indices, i.e. those indices that give measures of the fit of the overall time series (i.e. for instance the RMSE error), and (2) those that give a measure of the capability of a given model to predict critical episodes (i.e. for instance the SP index), referred to here as exceedence indices. A list

Structure of prediction models

It is necessary to stress here that the problem of finding the most appropriate structure for a statistical air quality prediction model (i.e. the exogenous inputs) is perhaps one of the major problem for the modellers. First of all the candidate variables are often numerous and not necessarily known a priori. Moreover, the link between the pollutant concentration and the exogenous inputs is non-linear and it depends on the geographical location of the measurement point. Further, the selected

Missing data interpolation

In order to provide a common set of data during the inter-comparison exercise, the problem of missing data was first addressed leading to the implementation of appropriate procedures for missing data interpolation. This pre-processing phase was carried out year by year on the whole dataset (including meteorological and pollution data). It is also necessary to observe that some of the modelling techniques considered (for example, LPH) require imputed data, as they are not able to handle missing

Results and discussion

For the sake of brevity, results will be extensively reported here only for the DMEA target that was the only one characterised by an appreciable accuracy. However, numerical results and consideration will be given for the DMAX and HMAX targets. Results that refer to DMEA models of MF type are reported in Table 7a and b for Melilli and Belfast, respectively, while performances of NMF models are shown in Table 7c and d. As it was expected, results show that MF models perform better than NMF

Conclusions

In this paper, some of the most promising statistical techniques for the prediction of SO2 concentration at a point were compared. The results show that there is no single modelling approach, which generates optimum results in terms of the full range of performance indices considered. However, assuming that in view of the implementation of a warning system for air quality control, approaches that are able to work better in the prediction of critical episodes must be preferred, the artificial

Acknowledgements

The support of the European Commission’s Framework V IST Programme (contract no. IST-1999-11764) is gratefully acknowledged. We would like to thank the Province of Siracusa, the UK Air Quality Archive and the British Atmospheric Data Centre for providing all the datasets considered in this work.

References (64)

  • A.R. Barron

    Neural net approximation

  • A.R. Barron

    Universal approximation bounds for superposition of a sigmoidal function

    IEEE Transactions on Information Theory

    (1993)
  • A. Benveniste et al.

    Wavelets in identification

    Proceedings of the SYSIS’94

    (1994)
  • G.E.P. Box et al.

    Time Series Analysis, Forecasting and Control

    (1976)
  • G.E.P. Box et al.

    Time Series Analysis, Forecasting and Control

    (1994)
  • L. Breiman et al.

    Estimating optimal transformations for multiple regression and correlation

    American Statistical Association

    (1985)
  • W.S. Cleveland

    Robust locally weighted regression and smoothing scatterplots

    Journal of the American Statistical Association

    (1979)
  • G. Cybenko

    Approximation by super precision of a sigmoidal function

    (1989)
  • Eklund, P., Klawonn, F., 1992. Neural fuzzy logic programming. IEEE Transactions Neural Network, vol. 3 (5), pp....
  • T. Fawcett

    Using rule sets to maximize ROC performance

  • G. Finzi et al.

    Real-time ozone episode forecast: a comparison between neural network and grey-box models

  • R. Fletcher

    Practical Methods of Optimization

    (1987)
  • F.D. Foresee et al.

    Gauss–Newton approximation to Bayesian regularization

  • R.J. Foxall et al.

    Error functions for predicting episodes of poor air quality

  • R.O. Gilbert

    Statistical Methods for Environmental Pollution Monitoring

    (1987)
  • D.E. Goldberg

    Genetic Algorithm in Search, Optimization and Machine Learning

    (1989)
  • P. Grassberger et al.

    Characterisation of strange attractors

    Physical Review Letters

    (1983)
  • G. Gschwandtner et al.

    Historic emissions of sulfur and nitrogen oxides in the United States from 1900 to 1980

    JAPCA

    (1986)
  • Gupta, M.M., Rao, D.H., 1994. On the principles of fuzzy neural network. Fuzzy Sets and System, vol. 61 (1), pp....
  • T.J. Hastie et al.

    Generalized Additive Models

    (1986)
  • T.J. Hastie et al.

    Generalized additive models: some applications

    Journal of the American Statistical Association

    (1987)
  • I. Hayashi et al.

    Fuzzy-Neural Networks, Soft Computing Series

    (1997)
  • Cited by (97)

    • Review of flue gas acid dew-point and related low temperature corrosion

      2020, Journal of the Energy Institute
      Citation Excerpt :

      The difficulty in the task to determine ADP and works relating to LTC have been discussed. The issue of ADP originates from the fact that the flue gas contains acid gases such as SO2 [8], SO3[9] [10] [11], HF [12,13], HBr [12,13], NO, NO2 [14], etc. These gases can transform to sulfuric acid (H2SO4), hydrochloric acid, nitric acid, and further condense under low temperature conditions [15–17].

    • Artificial neural network based modeling to evaluate methane yield from biogas in a laboratory-scale anaerobic bioreactor

      2016, Bioresource Technology
      Citation Excerpt :

      Neural networks are particularly suited to model complex non-linear processes. ANNs had become a popular tool since the last decade for modeling environmental systems, such as air pollution (Abdul-Wahab and Al-Alawi, 2002; Nunnari et al., 2004; Karaca et al., 2005) and prediction of performance of wastewater treatment plant (Hamed et al., 2004). Nevertheless, literatures on the predictive capabilities of neural networks on biogas generation rate from MSW are also available.

    View all citing articles on Scopus
    1

    JANN (Java Artificial Neural Network) Tool for air pollution modelling by using multi-layer perceptron neural networks.

    View full text