A SVM-based regression model to study the air quality at local scale in Oviedo urban area (Northern Spain): A case study

https://doi.org/10.1016/j.amc.2013.03.018Get rights and content

Abstract

This research work presents a method of daily air pollution modeling by using support vector machine (SVM) technique in Oviedo urban area (Northern Spain) at local scale. Hazardous air pollutants or toxic air contaminants refer to any substances that may cause or contribute to an increase in mortality or in serious illness, or that may pose a present or potential hazard to human health. In this work, based on the observed data of NO, NO2, CO, SO2, O3 and dust (PM10) for the years 2006, 2007 and 2008, the support vector regression (SVR) technique is used to build the nonlinear dynamic model of the air quality in the urban area of the city of Oviedo (Spain). One main aim of this model was to make an initial preliminary estimate of the dependence between primary and secondary pollutants in the city of Oviedo. A second main aim was to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. It is well-known that the United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. They are known as the criteria pollutants. This SVR fit captures the prime idea of statistical learning theory in order to obtain a good forecasting of the dependence among the main pollutants in the city of Oviedo. Finally, on the basis of these numerical calculations using SVR technique, from the experimental data, conclusions of this study are exposed.

Introduction

Air pollution is key factor of the environmental problems in metropolitan cities [1], [2], [3], [4]. It is clear that there are many air pollution indicators affecting human health [1], [5]. Thus, the information of the meteorological pollution, such as CO, NO, NO2, SO2, O3 and particulate matter (PM10) is more and more important due to their harmful effects on human health [1], [2], [6]. In this way, the automatic measurements of the concentration of these pollutants provide an instantaneous registration of the harmful pollution to inform or alarm the local inhabitants of the incoming danger. Towards that end, EU and many national environmental agencies have set standards and air quality guidelines for allowable levels of these pollutants in the air [7], [8], [9]. If the concentration levels of these indicators exceed the air quality guidelines, short term and chronic human health problems may occur [10].

As might be suspected, air is never perfectly clean [7]. In this sense, air pollution is a continuing threat to our health and welfare [9]. An average adult male consumes about 13.5 kg of air each day compared with about 1.2 kg of food and 2 kg of water. Therefore, the cleanliness of air should certainly be as important to humans as the cleanliness of their food and water.

This new and innovative research work builds a model to determine daily air pollution that would be applicable for the use by the authority responsible for its regulation in the Oviedo urban area at local scale. An artificial neural network (ANN), usually termed neural network (NN), is a mathematical model or computational model that is based on the structure and functional aspects of biological neural networks. The use of the artificial neural networks [11] of multilayer perceptron (MLP) type as the model of pollution was exploited frequently in the last years [6], [12], [13], [14], [15], [16], [17], [18], [19]. In this research work, it is proposed the system focused on the support vector machines (SVM) due to their versatility to tackle complex and highly nonlinear problems with success [20], [21], [22]. The SVM networks are built for the prediction of each considered pollutant here: CO, NO, NO2, SO2, O3 and dust.

On the one hand, similar to conventional feed-forward (FF) neural networks (NN), the SVM has been used by researchers to solve classification and regression problems [23], [24], [25], [26], [27], [28]. Possessing similar universal approximation ability, SVR can also be used to model nonlinear processes, just as conventional NNs are. In this paper, the SVR is used as a new tool to build a model of the air quality in the city of Oviedo (Spain). Compared with the FF NN models, the SVR model has certain advantages. In the first place, training for the SVR gives place to a global optimum. This is due to the fact that SVR is formulated as a convex quadratic optimization problem for which there is a global optimum [20], [21], [22], [27], [29], [30]. On the other hand, the training of FF NNs may become trapped at a local minimum. Therefore, mathematically, the SVR model has more attractive properties than the NN model. The second advantage is that the design and training for the SVR model are relatively more straightforward and systematic as compared with those for the NN model. The third advantage is that it is relatively easier to achieve good generalization when using SVR as compared with NNs. Finally, the SVR is a type of model that is optimized so that prediction error and model complexity are simultaneously minimized. To fix ideas, the formulation of SVR captures the main finding of statistical learning theory in order to obtain a good generalization so that both training error and model complexity are controlled, by explaining the data with a simple model [23], [24], [25], [26], [27], [28], [29], [30].

Oviedo is the capital city of the Principality of Asturias in northern Spain. It is also the name of the municipality that contains the city. Oviedo, which is the administrative and commercial center of the region, also hosts the annual Prince of Asturias Awards. This prestigious event, held in the city’s Campoamor theater, recognizes international achievement in eight categories. Oviedo University’s international campus attracts many foreign scholars from all over the globe. The city of Oviedo has a population of 221,202 inhabitants. It covers a land area of 186.65 km2, it has an altitude of 232 m above sea level and a density of 1185.12 inhabitants per square kilometer. The climate of Oviedo, as with the rest of northwest Spain, is more varied than that of southern parts of Spain. Summers are generally humid and warm, with considerable sunshine, but also some rain. Winters are cold with some very cold snaps and very rainy. The cold is especially felt in the mountains surrounding the city of Oviedo, where snow is present from October till May. Both rain and snow are regular weather features of Oviedo’s winters. On the other hand, there is a coal power plant located seven kilometers south from the city of Oviedo: Soto de Ribera coal-fired power plant (see Fig. 1 below). Such plant provides most of the electrical energy used in the city of Oviedo.

Fig. 1 presents the geographical location of the three meteorological stations and the Soto de Ribera’s coal-fired power plant. The Soto de Ribera’s coal power plant is located seven kilometers south from the city of Oviedo in the district of Ribera de Arriba and at an altitude of 126.50 m above sea level.

The dataset used in this study have been collected within three years: from 2006 to 2008. The numerical results based on the application of SVR technique have indicated a very good accuracy of monthly modeling for all considered pollutants. These detailed results will be presented and discussed throughout the study.

Section snippets

Sources and types of air pollution

Primary pollutants are emitted directly from identifiable sources. They pollute the air immediately upon being emitted. Secondary pollutants, in contrast, are produced in the atmosphere when certain chemical reactions take place among primary pollutants. The chemicals that make up smog are important examples. In some cases, the impact of primary pollutants on human health and the environment is less severe than the effects of the secondary pollutants they form [10], [27], [31], [32], [33].

Mathematical model

The SVM is a learning method with a theoretical root in statistical learning theory [23], [24], [25]. The SVM was originally developed for classification, and was later generalized to solve regression problems [20], [21], [22]. This method is called support vector regression (SVR). The model produced by support vector classification only depends on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin.

Results and discussion

Keeping in mind that the relationship among pollutants is highly nonlinear and very complex, it was mandatory to use more accurate analysis tools based on statistical learning such as the above mentioned support vector regression (SVR) and the well-known technique of the multilayer perceptron (MLP) [11], [12], [13], [25], [26], [27]. For the all normalized data samples, we have used a tolerance value ε=0.01. Therefore, the fit results, taking several types of kernels, are as follows [20], [24],

Conclusions

This study presents the application of the SVM technique to estimate highly nonlinear source-receptor relationships between precursor emission and pollutant concentrations. Such model is identified to be used for the resolution of the multi-objective air quality control problem in Oviedo urban area at local scale (Northern Spain).

The SVR technique is a type of optimized modeling approach so that the prediction error and model complexity are simultaneously minimized. Due to its universal

Acknowledgements

The authors wish to acknowledge the computational support provided by the Departments of Mathematics, Construction and Computer Science at University of Oviedo as well as the pollutant dataset in the city of Oviedo supplied by the Section of Industry and Energy from the Government of the Principality of Asturias. This paper has been funded by the Government of the Principality of Asturias through funds from the Programme of Science, Technology and Innovation (PCTI) of Asturias 2006–2009,

References (46)

  • A. Suárez Sánchez et al.

    Application of a SVM-based regression model to the air quality study at local scale in the Avilés urban area (Spain)

    Math. Comput. Modell.

    (2011)
  • B. Üstün et al.

    Facilitating the application of support vector regression by using a universal Pearson VII function based kernel

    Chemom. Intell. Lab. Syst.

    (2006)
  • P.J. García Nieto

    Parametric study of selective removal of atmospheric aerosol by coagulation, condensation and gravitational settling

    Int. J. Environ. Health Res.

    (2001)
  • A. Akkoyunku et al.

    Evaluation of air pollution trends in Istanbul

    Int. J. Environ. Pollut.

    (2003)
  • T. Godish

    Air Quality

    (2004)
  • C.D. Cooper et al.

    Air Pollution Control

    (2002)
  • L.K. Wang et al.

    Air Pollution Control Engineering

    (2004)
  • F.K. Lutgens et al.

    The Atmosphere: An Introduction to Meteorology

    (2001)
  • S. Haykin

    Neural Networks. Comprehensive Foundation

    (1999)
  • M. Bianchini, E. Di Iorio, M. Maggini, C. Mocenni, A. Pucci, A cyclostationary neural network model for the prediction...
  • F. Karaca et al.

    NN-AirPol: a neural-network-based method for air pollution evaluation and control

    Int. J. Environ. Pollut.

    (2006)
  • E. Aguirre-Basurko et al.

    Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area

    Environ. Modell. Software

    (2006)
  • J. Shawe-Taylor et al.

    Kernel Methods for Pattern Analysis

    (2004)
  • Cited by (119)

    View all citing articles on Scopus
    View full text