A systematic comparison of different machine learning models for the spatial estimation of air pollution

Cerezuela-Escudero, Elena; Montes-Sanchez, Juan Manuel; Dominguez-Morales, Juan Pedro; Duran-Lopez, Lourdes; Jimenez-Moreno, Gabriel

doi:10.1007/s10489-023-05109-y

A systematic comparison of different machine learning models for the spatial estimation of air pollution

Open access
Published: 31 October 2023

Volume 53, pages 29604–29619, (2023)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

A systematic comparison of different machine learning models for the spatial estimation of air pollution

Download PDF

Elena Cerezuela-Escudero ORCID: orcid.org/0000-0003-0176-7863¹,
Juan Manuel Montes-Sanchez¹,
Juan Pedro Dominguez-Morales¹,
Lourdes Duran-Lopez¹ &
…
Gabriel Jimenez-Moreno¹

841 Accesses
1 Citation
Explore all metrics

Abstract

Air pollutants harm human health and the environment. Nowadays, deploying an air pollution monitoring network in many urban areas could provide real-time air quality assessment. However, these networks are usually sparsely distributed and the sensor calibration problems that may appear over time lead to missing and wrong measurements. There is an increasing interest in developing air quality modelling methods to minimize measurement errors, predict spatial and temporal air quality, and support more spatially-resolved health effect analysis. This research aims to evaluate the ability of three feed-forward neural network architectures for the spatial prediction of air pollutant concentrations using the measures of an air quality monitoring network. In addition to these architectures, Support Vector Machines and geostatistical methods (Inverse Distance Weighting and Ordinary Kriging) were also implemented to compare the performance of neural network models. The evaluation of the methods was performed using the historical values of seven air pollutants (Nitrogen monoxide, Nitrogen dioxide, Sulphur dioxide, Carbon monoxide, Ozone, and particulate matters with size less than or equal to 2.5 $\upmu $m and to 10 $\upmu $m) from an urban air quality monitoring network located at the metropolitan area of Madrid (Spain). To assess and compare the predictive ability of the models, three estimation accuracy indicators were calculated: the Root Mean Squared Error, the Mean Absolute Error, and the coefficient of determination. FFNN-based models are superior to geostatistical methods and slightly better than Support Vector Machines for fitting the spatial correlation of air pollutant measurements.

Graphical abstract

Artificial intelligence-based solutions for climate change: a review

Article Open access 13 June 2023

Water quality prediction using machine learning models based on grid search method

Article Open access 29 September 2023

Air pollution prediction with machine learning: a case study of Indian cities

Article 15 May 2022

1 Introduction

Exposure to air pollution is a risk factor for diseases such as stroke, asthma, cancer and chronic obstructive pulmonary disease. It is associated with noxious effects on human health and it is especially harmful to vulnerable groups such as children, the elderly and patients with respiratory and cardiovascular diseases [1,2,3,4]. Air pollutants not only severely impact public health, but also the climate and ecosystems because several of them are greenhouse gases [1, 5]. Many of the air pollutants are also sources of greenhouse gas emissions. Considering the significance of air quality on health and the environment, the World Health Organization (WHO) has developed guidelines to improve air quality by setting limits on the concentrations of various air pollutants: ozone (O3), nitrogen dioxide (NO2), sulphur dioxide (SO2), carbon monoxide (CO), particulate matter with size less than or equal to 2.5 $\upmu $m (PM2.5) and particulate matter with size less than or equal to 10 $\upmu $m (PM10) [5].

Air quality monitoring is an important task for governments to provide information on potential health risks and determine appropriate environmental management policies. The development of air pollution sensing technology in the last few decades and the support of government agencies have contributed to build an air quality monitoring network in many urban areas with the aim of analysing and publishing the concentrations of several air pollutants that are potentially harmful to health^{Footnote 1}^{Footnote 2}. However, these networks are usually sparsely distributed and the sensor calibration problems that may appear lead to missing and wrong measurements [6,7,8,9]. There is an increasing interest in developing air quality modelling methods to minimize measurement errors, predict spatial and temporal air quality, and support more spatially-resolved health effect analyses [8,9,10,11,12,13].

Air pollution modelling follows two different approaches. The first approach consists in using the deterministic mathematical modelling of atmospheric pollutant dispersion. The second approach, on the other hand, consists in employing statistical models based on historical air quality data, and, in some cases, meteorological and geographic information too.

The deterministic mathematical modelling involves the simulation of pollutant dispersion and transport mechanisms using emission values in industrial and urban areas, physical and chemical processes in the atmosphere, meteorological data, and geographic and topological information. The deterministic methods that are most present in the scientific literature are the “Community Multiscale Air Quality (CMAQ) model” [14, 15], “Weather Research and Forecasting model with Chemistry (WRF-Chem)” [16], and “Nested Air Quality Prediction Modelling System (NAQPMS)” [17, 18]. The deterministic modelling has limitations due to the enormous number of pollution sources and the fact that air distribution is influenced by several complex physical/chemical processes that require many variables.

The statistical approach takes advantage of the spatial and temporal correlations that are present in the air pollution concentration time series and formulates models that simulate these dependencies with a high degree of accuracy. Several methodologies have been developed along this approach, including classical statistics [19], artificial intelligence [8, 9, 11,12,13, 20, 21] and geostatistical techniques [6, 7, 22].

The application of Artificial Neural Networks (ANNs) has been frequently used to forecast air quality. Some recent articles [7, 13, 20, 21, 23,24,25,26] use the historical values of various pollutants to predict the air quality index and/or air pollutant concentrations. Several of them use meteorological data too. Machine and deep learning methods show a more remarkable ability to simulate non-linear systems because of their self-learning, self-organizing, and self-adaptation features.

Instead, fewer studies have used the ANN technique for the spatial estimation of air pollution [6, 8, 11, 12, 27]. The study [8] evaluates the use of a Back-Propagation Neural Networks (BPNN) for modelling the spatial atmospheric pollution of five air pollutants (NO2, O3, SO2, PM2.5 and PM10). The authors of [6] proposed machine learning and geostatistical methods to predict PM2.5 pollution levels. Some of these studies applied Deep learning methods to extract complex and non-linear spatiotemporal correlations [11, 12]. The next section further describes the references about the use of machine learning and deep learning methods for predicting air pollution.

The main objective of this research was to develop an ANN-based system for modelling the spatial characteristics of air pollutant concentrations measured at an urban air quality monitoring network. To estimate the air pollutant value at the target site based on the measurements collected at nearby locations, we applied three Feed-Forward Neural Network (FFNN) architectures for regression (with one, two, and three fully connected hidden layers), a Support Vector Machine (SVM) and geostatistical methods. The evaluation of the methods was performed by using the historical values of seven air pollutants (Nitrogen monoxide (NO), NO2, O3, SO2, CO, PM2.5, and PM10) collected at the urban air quality monitoring network located at the greater metropolitan area of Madrid (Spain). For assessing and comparing the predictive ability of the models, three estimation accuracy indicators were calculated: the Root Mean Squared Error (RMSE), the Mean Absolute Error (MAE) and the coefficient of determination (R²).

The main contributions of this work include the following:

A broad analysis of the most common pollutants evaluated on a real dataset consisting of samples obtained from the air quality monitoring network deployed in the city of Madrid (Spain).
A FFNN-based air pollution spatial estimation system able to accurately predict NO2, NO, SO2, O3, PM2.5 and PM10 concentrations.
A systematic comparison between two geostatistical models, SVM, and three FFNN architectures (one-hidden FFNN, bi-layer FFNN, and tri-layer FFNN) for evaluating the prediction of the concentration of seven air pollutants (NO2, NO, CO, SO2, O3, PM2.5 and PM10).

The present work is structured as follows. The following section reviews literature related to machine learning and deep learning methods applied to air quality prediction. Section 3 presents the study area and the air pollution dataset used in the experiments. In Section 4, the methods applied to estimate air pollutant concentrations are described: Inverse distance weighted (IDW), Ordinary Kriging (OK), SVM, FFNN. In Section 5, every phase of the experiments is presented. Section 6 describes the results of the experiments performed. In Section 7 we discuss our results. Finally, the conclusions are presented in Section 8.

2 Related work

Machine learning methods are widely used to extract non-linear correlations in air pollutant concentration data. Deploying air pollution monitoring stations in urban areas generates a massive amount of collected data, creating databases suitable for statistical analysis. In addition, machine learning methods do not require a deep understanding of the dynamic and chemical processes between air pollutants and other relative atmospheric variables.

ANNs have been improved through years of research and applications, bringing more evolved versions to air pollution prediction. For example, [27] proposed a BPNN to estimate the hourly concentrations of NO2 in unsampled locations at Algeciras Bay (Spain) using the historical values of NO2 concentrations at fourteen monitoring stations and the distances to each monitoring station. The prediction system has a first stage that applies the IDW interpolation and a multiple linear regression method to produce air pollution maps that are used as input of the the BPNN. The highest result achieved was an $R^{2}$ value of 0.76. They also evaluated the methods separately, showing the BPNN as the best prediction method in most monitoring stations. Instead, [6] used several methods to obtain daily estimates of PM2.5 concentration across the contiguous US, and the results showed a better predictive performance of spatial statistical models over machine learning methods. The 829 monitor stations take measurements every 1, 3, or 6 days, with only approximately 15% of monitors sampling daily, implying an irregular and sparse dataset. In other recent research, [8] applied a BPNN for spatially estimating pollutant concentrations in the metropolitan city of Athens in Greece. Five pollutants were estimated, NO2, O3, PM10, PM2.5 and SO2, and the $R^{2}$ values for O3 and PM10 were above 0.87,and for NO2, PM2.5, and SO2 were 0.76, 0.69 and 0.55, respectively.

Deep learning has become increasingly widely used in air quality prediction because of its ability to extract complex and non-linear spatiotemporal correlations on large datasets. Many researchers employed a Long Short Term Memory (LSTM) network for modelling the complex and non-linear temporal correlation of the historical values of the pollutants [9, 20, 25, 28, 29]. LSTM is an enhanced version of the Recurrent Neural Network for handling long-time sequence data. The recent studies [11, 24, 30] have proposed combined models for air quality prediction based on Convolutional Neural Networks (CNNs) to extract the spatial characteristics, and LSTMs to predict future air pollution concentrations. [24] developed a CNN-LSTM method for predicting the next day’s daily average PM2.5 concentration in Beijing City. [30] proposed an attention-based CNN-LSTM multilayer structure to predict the PM2.5 concentration in the next 72 hours at Beijing-Tianjin-Hebei region. This research analysed the historical air quality and meteorological data of 100 monitoring stations for spatiotemporal correlation. [11] proposed spatiotemporal forecasting models of Beijing’s Air Quality Index. Four methods (CNN, LSTM, CNN-LSTM, BPNN) were evaluated to extract the spatiotemporal characteristics of air quality concentration data (hourly PM2.5, PM10, SO2, NO2, O3, and CO) and the relations with meteorological and spatiotemporal data. The method that showed the best performance in next-hour forecasting was the CNN-LSTM method. [12] developed a spatiotemporal air quality prediction model based on LSTMs. The input data were Beijing’s historical concentrations of PM2.5, SO2, NO2, O3, and CO and meteorological data. The output was the concentration sequence of PM2.5, CO, NO2, O3, and SO2 at four monitoring sites. The model’s prediction accuracy is high, as shown by the best $R^{2}$ value of the four analysed sites: 0.939 for PM2.5, 0.847 for CO, 0.875 for NO2, 0.935 for O3, and 0.809 for SO2. Some researchers took advantage of the CNN’s ability to process sequence-structure data for air pollutant concentration prediction. For example, [31] used a five-layer CNN to extract the temporal correlation from historical observation data and predict the ozone concentration in the next 24 hours in an urban area. [7] applied the IDW method to interpolate air quality and weather data collected in South Korea and then used the interpolation as input of the CNN to predict PM2.5 and PM10 concentrations. The results show an effective prediction performance with an $R^{2}$ higher than 0.97.

In our research, we have deployed and compared six air quality prediction methods present in the recent literature: spatial-based interpolation (IDW), geostatistical model (Kriging), machine learning (SVM and FFNN), and deep learning (FFNN with 2 - 3 hidden layers). Each method is evaluated for extracting spatial dependencies of seven air pollutants: NO2, NO, CO, SO2, O3, PM2.5, and PM10. Therefore, we have broadly compared different methods to predict the concentrations of several air pollutants that are generated by various sources and that have different behaviours. The analysis of so many air pollutants is rare in the literature. The urban area selected has an air quality monitoring network with a spatial density and a sample time higher or similar to recent studies.

3 Study area and dataset

The area of study is the city of Madrid, which is the capital of Spain and the largest and most populated metropolitan area of the country. Madrid’s province population has grown from 6,446 million in 2016 to 6,751 million in 2021^{Footnote 3}. The elevation at its centre (40$^{\circ }$ 25’ N, 3$^{\circ }$ 41’ W) is 657 m. Madrid area’s expected mean temperature changes from 9.8 $^{\circ }$C in January to 32.1 $^{\circ }$C in July, experiencing cold winters and hot summers. Spring and autumn are the seasons with more expected rainy days, while the summer months are usually dry and sunny^{Footnote 4}.

Madrid’s air pollution levels are high, although, since the activation of air quality policies in 2011, those levels were effectively reduced [32]. This study analysed hourly time series of four air pollutants (NO2, O3, PM10, and SO2) monitoring in Madrid urban area during the period from 2001 to 2017 by a two-stage method: first, a Hidden Markov Model was used to characterize the air pollution at temporal scales. Then, the spatial distribution was analysed by combining the interpolation results of Ordinary Kriging and Inverse Distance Weighting. [32] concludes that the air pollution spatial analysis is challenging to assess due to meteorological and physical factors and the regional contributions originated in adjacent municipals. Not only is human activity responsible for bad air quality, but also other climate events like Saharan dust intrusions have an impact by rising PM levels [32, 33].The research [34] examined the effects of local road traffic, meteorological conditions, and temporal variables on air pollution in Madrid. Its result showed that air pollutant levels were weakly linked to local vehicular emissions because various elements affect the pollutant concentration, mainly meteorological agents, topography, tree and shrub presence, building distribution, and water streams like rivers.

This study uses hourly air quality data measured between January 2016 and August 2018 by the air monitoring network of Madrid. Madrid’s city council operates an air quality monitoring network from 2001 and publishes both real-time and historical air quality data^{Footnote 5}. We used the Dataset “Air Quality in Madrid”^{Footnote 6} that contains the processed data from the files offered by the Madrid’s city council with a structured organization based on timestamp and standard format data. It consists of a file for each year where each row is timestamped and the columns are the different measures performed at that point in time in a certain station. In addition, the information regarding each station (identifier, name, address, coordinates and elevation) is available in another file. The measurements of many pollutants are available (NO, NO2, O3, SO2, CO, PM2.5, PM10, toluene, benzene, methane), but not every station is equipped with all air pollution sensors, which have been increasing over the years.

The study area and the distribution of 24 air quality monitoring stations deployed in Madrid are shown in Fig. 1. The maximum distance between two stations is 7 kilometres, and the average distance between stations at the urban centre is 3. Table 1 contains the coordinates, elevation and the air pollutant measured in each air quality monitoring station between 2016 and 2018. We selected the most harmful air pollutants according to WHO and several government agencies like the U.S. Environmental Protection Agency^{Footnote 7}, European Environment Agency^{Footnote 8} and Health Canada and Environment Canada^{Footnote 9}. The criteria for the selection of the monitoring sites are the hourly data availability along with homogeneous spatial data coverage for each air pollutant.

Table 2 presents the measuring units and the range of measured concentrations for each pollutant between 2016 and 2018.

Table 1 Air quality monitoring stations, coordinates, and measured pollutants (from 2016 to 2018)

Full size table

Table 2 Measuring units and range of measured concentrations for each pollutant (years 2016 to 2018)

Full size table

4 Methods

In this work, we present a system based on FFNN architecture for regression to predict the air pollutant concentration in a specific location based on the measurements obtained from nearby monitored locations. To compare our proposal, we applied the following geostatistical and machine learning methods: Inverse Distance Weighting (IDW), Ordinary Kriging (OK), and Support Vector Machine (SVM).

4.1 Inverse distance weighting and kriging methods

Nearly all spatial interpolation methods share the same general estimation formula, which is as follows:

$$\begin{aligned} Z(x_{0}) = \sum _{i=1}^{n}(w_{i} z(x_{i})) \end{aligned}$$

(1)

where Z is the estimated value at the point of interest $x_{0}$, z is the observed value at the sampled point $x_{i}$, $w_{i}$ is the weight assigned to the sampled point, and n represents the number of sampled points used for the estimation. The difference between the methods depends on the formula to calculate the weights. The two most commonly used interpolation methods in the literature are IDW and OK [35, 36]. The IDW method uses the following expression for the weight:

$$\begin{aligned} w_{i} = \frac{\frac{1^{p}}{d_{i}}}{\sum _{i=1}^{n}\frac{1^{p}}{d_{i}}} \end{aligned}$$

(2)

where $d_{i}$ is the distance between $x_{0}$ and $x_{i}$, and p is an exponent that determines the influence of values closest to the interpolated point, while the weight for OK is estimated by minimizing the variance of the prediction errors. It is assumed that the data are part of an intrinsic function z(x) with the sample variogram [37]. The sample variogram is fitted with specific known positive defined functions. The most common functions are linear, spherical, exponential, and Gaussian.

In our experiments, we tested the IDW model for p=1 and p=2 and, for the implementation of the OK method, we applied four function models for fitting the sample variogram: spherical, exponential, Gaussian and bounded linear.

4.2 Support vector machine

SVM is a popular machine learning tool for classification, but it can also be used for regression analysis [38, 39]. SVMs aim to provide a nonlinear function to map a given training data set D: ${(x_{1}, y_{1}), (x_{2}, y_{2}),...,(x_{i}, y_{i})}$ to a high dimensional feature space. In this space, a hyperplane is optimized to be within a certain threshold of the selected data, called the support vectors, and the hyperplane is used for predicting regression.

A linear epsilon-insensitive ($\epsilon $) SVM was used for regression, which is also known as L1 loss. The set of training data included predictor variables and observed response values. The goal was to find a function z(x) that deviates from observed values no greater than $\epsilon $ for each training point x, and that is as flat as possible at the same time. The training of the SVM with epsilon-insensitive loss function was performed by using quadratic programming for minimising the objective-function.

4.3 Feed-forward neural network for regression

ANNs are massively parallel interconnected networks of simple, hierarchically organized elements (artificial neurons) that attempt to interact with the environment in the same way as the biological nervous system [40]. The output of such an artificial neuron can be calculated using the (3).

$$\begin{aligned} y = f(\sum _{i=1}^{n}(w_{i}x_{i}))+b) \end{aligned}$$

(3)

where $x_{i}$ are the inputs, n the number of inputs, $w_{i}$ the synaptic weights, b the threshold and f the activation function. The most commonly used activation functions are linear, sigmoid, and hyperbolic tangent. Artificial neurons are arranged in several layers and connected by synaptic weights.

In this work, three feed-forward, fully connected neural networks were used for regression. The structure includes an input layer, one or more hidden layers, and an output layer. The input layer takes information (predictor data) from the domain and passes it to all the neurons from the first hidden layer. As the first hidden layer is fully connected to the input layer, each subsequent layer is connected to all the neurons from the previous layer. Each neuron of a fully connected layer multiplies the input by the synaptic weight and then adds the multiplication results with the bias. The sum is passed through an activation function. The final fully connected layer produces the network’s output (predicted response values). The proposed architecture for the fully connected layered neural network with two hidden layers is shown in Fig. 2. An enlarged diagram of a single artificial neuron is presented separately to show its five components -inputs, synaptic weights, sum, bias, and activation function. We chose two activation functions for hidden layers: a Rectified Linear Unit (ReLU) and a sigmoid function. These functions are described in (4) and (5), respectively. According to the regression problem, the activation function of the output layer is the linear function f(x) = x.

$$\begin{aligned} f(x) = \left\{ \begin{matrix} x, &{} x \ge 0\\ 0, &{} x < 0 \end{matrix}\right. \end{aligned}$$

(4)

$$\begin{aligned} f(x) = \dfrac{1}{1+e^{-x}} \end{aligned}$$

(5)

The training is based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm (LBFGS) [41], where the mean squared error (MSE) is minimized.

We proposed three FFNN architectures for extracting the spatial characteristics of the air pollution concentration: one hidden layer neural network, a bi-layer neural network, and a tri-layer neural network. Regarding the current application, the number of monitoring sites for predicting each pollutant defines the number of input nodes, and the output layer consists of a fully connected neuron to return the prediction. The number of neurons in each hidden layer is configurable. In order to compare the same structure for different air pollutants, we determined 10 neurons per hidden layer for each architecture.

5 Experiments

Figure 3 shows the diagram of the experimental workflow. It was followed for the evaluation of several prediction models to estimate each air pollutant.

Firstly, we selected the target location, which should be a point with historical concentrations of the air pollutant to be estimated. We chose target stations based on their position close to the centre of Madrid and the five stations closest to the target site. However, our decision was constrained by the availability of air pollution sensors in each station. Table 3 shows the monitoring station selected as the target site and five nearby monitoring stations used to estimate the air pollutant concentrations. The average distance between the target and the selected stations is 3.26 kilometres. The coordinates and distribution of the air quality monitoring stations are shown in Table 1 and in Fig. 1, respectively.

Table 3 The selected monitoring stations and distances to the target site for each air pollutant under evaluation

Full size table

Secondly, we processed the dataset to get the air pollutant concentrations from the selected stations. Each sample included the air pollutant concentrations for each station taken simultaneously. The samples with some missing values were removed. The input values for IDW and OK models are the latitudes, longitudes, and pollutant concentration at the five nearby stations. However, the input data for SVM and FFNN are the historical values of the pollutants from six stations, for training, and the pollutant concentration at the five nearby stations, for predicting.

Then, the predictive models were designed. We developed two IDW models (p=1 and p=2) for evaluating the influence of the distance in the prediction. When p=2, the method is known as the inverse distance squared weighted interpolation. To implement the OK method, four function models were applied to fit the sample variogram: spherical, exponential, Gaussian, and bounded linear. We proposed a linear $\epsilon $-SVM for regression and three different FFNN architectures: one hidden layer neural network, a bi-layer neural network, and a tri-layer neural network. The number of monitoring sites for predicting each pollutant defines the number of input nodes, a parameter set to five in our system. The output layer consists of a fully connected neuron to return the prediction. The number of neurons in each hidden layer is a configurable parameter, and we set it to ten to compare the same structure for different air pollutants. We tested two activation functions for hidden layers: a Rectified Linear Unit (ReLU) and a sigmoid function.

In the case of the SVM and FFNN models, the next phase is training. In both cases, we used 80% of the samples to train the neuronal network model and the remaining 20% to test the performance of the trained model with new data. Figure 4 shows the data and processes of training stage. The results are the trained FFNN (or SVM) and the difference and correlation measures. The IDW and OK models do not require training since the estimated value is calculated from the simultaneous measurements taken at the nearby stations and the distances to the target site. We used the test set to validate the SVM and FFNN models and the whole dataset to evaluate the IDW and OK models. The accuracy of each model is based on the comparison of the observed and predicted concentrations and the statistical analysis of the residual values.

Finally, we determined the mean performance of each model by using a set of difference and correlation measures: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the coefficient of determination ($R^{2}$). $R^{2}$ is the proportion of variation in the dependent variable that is predicted by the statistical model (range from 0 to 1). Therefore, $R^{2}$ provides information about the model’s goodness of fit. The following equations determine these error measures between observed ($x_{i}$) and predicted ($y_{i}$) values, where n represents the number of sampled points used for the estimation and $\mu x$ is the mean of observed values.

$$\begin{aligned} MAE = \frac{1}{n}\sum _{i=1}^{n}\left| x_{i} - y_{i} \right| \end{aligned}$$

(6)

$$\begin{aligned} RMSE = \sqrt{\frac{1}{n}\sum _{i=1}^{n}\left( x_{i} - y_{i} \right) ^{2}} \end{aligned}$$

(7)

$$\begin{aligned} R^2 = 1 - \frac{\sum _{i=1}^{n}\left( x_{i} - y_{i} \right) ^{2}}{\sum _{i=1}^{n}\left( x_{i} - \mu _{x} \right) ^{2}} \end{aligned}$$

(8)

This process was followed for the seven selected pollutants: NO, NO2, O3, SO2, CO, PM2.5 and PM10.

6 Results

Six methods for modelling the spatial characteristics of the air pollutant concentrations were implemented and evaluated for each of the seven air pollutants (NO2, NO, CO, SO2, O3, PM2.5, and PM10). The experiments used historical pollutant data from monitoring stations of Madrid city’s air quality monitoring network collected from January 1, 2016, to December 31, 2018. Among the different model variants that were tested for the IDW and OK methods, only the best ones are shown in the results. We proposed three FFNN architectures: a FFNN with one fully connected layer (FCL), two FCLs, and three FCLs. Except in the prediction of SO2 concentration, we chose to apply the ReLU function in the hidden layers because it has a greater accuracy than the sigmoid one.

Tables 4, 5, 6, 7, 8, 9 and 10 present the assessment of the statistical (IDW and OK) and machine learning (SVM and FFNN) methods by RMSE, MAE and $R^2$. RMSE and MAE are in the same units of pollutant concentrations, and $R^2$ ranges from 0 to 1.

Table 4 Overall performance metrics for the prediction of NO2

Full size table

Table 5 Overall performance metrics for the prediction of NO

Full size table

Table 6 Overall performance metrics for the prediction of CO

Full size table

Table 7 Overall performance metrics for the prediction of SO2

Full size table

Table 8 Overall performance metrics for the prediction of O3

Full size table

Table 9 Overall performance metrics for the prediction of PM2.5

Full size table

Table 10 Overall performance metrics for the prediction of PM10

Full size table

The results for predicting NO2 and NO showed a better predictive performance of machine learning methods over spatial statistical models. The bi-layer FFNN method presents maximum $R^{2}$ and minimum RMSE for estimating NO2 and NO air pollutants. The proposed models for estimating the CO concentrations result in low accuracy. Neither method is appropriate to extract the spatial characteristics of the CO concentrations. A possible reason for this result may be that CO is a primary pollutant whose concentration is closely linked to local combustion phenomena, aggravated in some cases by low dispersion due to thermal inversion or lack of wind. Heavy traffic or traffic jams in a specific area can cause high local measurements of CO without hardly affecting neighbouring areas. A higher density of measurement stations could be needed to get better results for CO because its concentration relative to the distance from the pollution source can decrease quickly. In the case of SO2, the accuracy of the machine learning methods is vastly superior to that of the spatial statistical models. FNN methods experiment an exceptional improvement when increasing the number of hidden layers. The prediction models of O3, PM2.5, and PM10 present similar performances, with slightly higher accuracy for bi-layer FFNN methods.

The bi-layer FFNN method exhibits the best result for most of the air pollutants under evaluation, except for the prediction of SO2, where the best performance is led by the tri-layer NN. The highest coefficient of determination (0.9) is reached by the bi-layer FFNN method for the prediction of NO2 and PM10. In most cases, we could rank the methods according to their performance (from best to worst) as follows: bi-layer FFNN, tri-layer FFNN, FFNN, SVM, IDW, and OK. The IDW and OK methods present similar results and have the lowest accuracy for all the evaluated air pollutants. The SVM models exhibit lower predictive ability than the FFNN methods for fitting air pollution concentration.

Figure 5 shows the bi-layer FFNN performance for the seven air pollutants (NO, NO2, CO, SO2, O3, PM2.5 and PM10), based on the 20% of data used for validation. In the left column, the scatter plots of observed versus predicted concentrations for each air pollutants are shown. In all cases, low dispersion is observed along the diagonal of the optimum prediction. The centre column shows the residual plots that display the difference between the predicted and measured values. The right column contains the residual histogram showing the relative probability of predictive errors. A general remark is that many error values are around zero and residuals distributions are in accordance with the mean model performance metrics.

7 Discussion

The main objective of this work was estimating air pollution concentration in a certain location based on the measurements taken at nearby stations for addressing the missing values and detecting uncalibrated sensors. We have developed several statistical (IDW and OK) and machine learning (SVM, FFNN, bi-layer FFNN, tri-layer FFNN) methods to model the spatial characteristics of the concentrations of seven air pollutants (NO2, NO, CO, SO2, O3, PM2.5, and PM10) measured by Madrid’s air quality monitoring network. The models were evaluated and compared using MAE, RMSE, and $R^2$ as accuracy indicators.

Table 11 Effect of decreasing FFNN architecture

Full size table

IDW and OK’s statistical models reached $R^2$ greater than 0.75 for NO2, NO, O3, and PM10. The prediction accuracy of the FFNN methods is better than the results of IDW and OK for all analysed air pollutants. There is a more significant accuracy difference between geostatistical models and FFNN methods for predicting NO2, NO, and SO2 than O3, PM2.5, and PM10. The results show the effectiveness of the bi-layer FFNN model to fit spatial correlation of the air pollution concentration of NO2, NO, SO2, O3, PM2.5 PM10. For the prediction of SO2, the tri-layer FFNN model improves the bi-layer FFNN accuracy. Therefore, the proposed systems have a direct application to provide missing values of the air monitoring network and can be used to detect uncalibrated sensors. Neither method is appropriate to extract the spatial characteristics of the CO concentrations. A possible reason for this result may be that the CO concentration is closely linked to local combustion phenomena. A higher density of measurement stations could be needed to get better results for CO.

As mentioned in Section 2, the models proposed by [12] and [7] reach a very high $R^2$ value regarding the rest of the recent research analysed in this section. The proposed bi-layer FFNN-based system reaches a better $R^2$ value of 0.9 for the prediction of NO2. The prediction of SO2 is less frequently analysed in the related recent studies and gets worse performance. [12] presents the highest R2 value of 0.81. In this work, the tri-layer FFNN system reaches an $R^2$ value of 0.79. [12] proposed a model based on an LSTM network with n hidden layers and a fully connected layer. Adding a fully connected layer improves the performance of the prediction model, according to the results presented in [12] and this work. The input data of the models proposed by [12] and [7] are air pollution concentrations collected at the monitoring stations and meteorological data. A possibility of future improvement is introducing an LSTM to extract temporal correlation and adding meteorological variables such as temperature, dew point, pressure, wind direction, and wind speed to data input.

Several ablation experiments were carried out to evaluate the effect of reducing components of the proposed neural network architecture. The effectiveness of hidden layer size was examined by two implementations of bi-layer FFNN with seventy and fifty percent reduction of neurons in hidden layers. For most of the pollutants analyzed, the performance of the structure with a seventy percent reduction is similar to the full bi-layer FFNN, obtaining the same value of the R2 metric and a slight degradation of MAE and RMSE. In the case of SO2, the R2 value decreases by 0.02. In contrast, reducing the number of neurons in the hidden layers to fifty percent decreases the system’s accuracy for all cases, as shown in Table 11 by the R2 metric. The performance for PM10 prediction decreases markedly with an R2 value of 0.61. To verify the influence of the features on the proposed system performance, the farthest station measurements were removed from the input. For predicting NO2 and NO, station 1 was eliminated. For CO and PM10, station 10. For SO2, O3, and PM2.5, stations 13, 4, and 7, respectively. The third entry in Table 11 shows the performance results for the system with four inputs instead of five. The R2 value decreases for predicting all pollutants, with the most significant difference for SO2. These ablation experiments were also performed for the one-hidden FFNN and tri-layer FFNN systems, obtaining similar results to the bi-layer FFNN.

8 Conclusion

In this study, three FFNN architectures (with one, two, and three fully connected hidden layers) were implemented and evaluated for modelling the spatial correlation of the concentrations of seven air pollutants: NO2, NO, CO, SO2, O3, PM2.5, and PM10. A comparison with other exposure modelling approaches has been presented: an SVM model and two geostatistical models (IWD and OK). The input dataset is historical pollutant measurements collected by Madrid’s air quality monitoring network from January 1, 2016, to December 31, 2018.

The performance results reveal that bi-layer FFNN and tri-layer FFNN systems are suitable for the spatial prediction of NO2, NO, SO2, O3, PM2.5, and PM10 concentration with an accuracy of ($R^2$) 0.9, 0.83, 0.79, 0.88, 0.75, 0.91, respectively. The comparison results show that FFNN models are superior to geostatistical methods and slightly better than Support Vector Machines for fitting the spatial correlation of air pollutant measurements (NO2, NO, CO, SO2, O3, PM2.5, and PM10) collected at nearby locations (less than 3.5 kilometres). For the prediction of NO2 and SO2 concentrations, the bi-layer FFNN and tri-layer FFNN models get a similar accuracy to the recent studies where the BPNN and deep neural network were developed.

In future work, we expect to introduce an LSTM neural network to extract the temporal correlation of air pollution concentration. Also, we will extend the system input with meteorological variables such as temperature, dew point, pressure, wind direction, and wind speed to data input to evaluate the prediction system performance.

Data availability

The datasets analysed during the current study are available in the Air Quality in Madrid (2001-2018) repository https://www.kaggle.com/datasets/decide-soluciones/air-quality-madrid

Notes

European Air Quality Index: https://www.eea.europa.eu/themes/air/air-quality-index. Retrieved 2023/10/26 13:28:50
AIRNOW: https://www.airnow.gov/aqi/aqi-basics. Retrieved 2023/10/26 13:28:50
Instituto Nacional de Estadística (National Statistics Institute): https://www.ine.es/dynt3/inebase/index.htm?padre=1689 &capsel=9041. Retrieved October 13, 2023.
Madrid, Retiro: Madrid, Retiro - Agencia Estatal de Meteorología - AEMET. Gobierno de España: https://www.aemet.es/es/serviciosclimaticos/datosclimatologicos/valoresclimatologicos?l=3195 &k=28. Retrieved October 13, 2023.
Open data portal of the Madrid City Council: https://datos.madrid.es/portal/site/egob. Retrieved October 13, 2023.
Kaggle Air Quality in Madrid (2001-2018): https://www.kaggle.com/datasets/decide-soluciones/air-quality-madrid. Retrieved October 13, 2023.
AIRNOW: https://www.airnow.gov/aqi/aqi-basics. Retrieved October 13, 2023
European Air Quality Index: https://www.eea.europa.eu/themes/air/air-quality-index . Retrieved October 13, 2023
Canada Air Quality Health Index: https://www.canada.ca/en/environment-climate-change/services/air-quality-health-index/about.html. Retrieved October 13, 2023

References

Manisalidis I, Stavropoulou E, Stavropoulos A, Bezirtzoglou E (2020) Environmental and health impacts of air pollution: a review. Front Public Health 8:14
Article Google Scholar
Wolf K, Hoffmann B, Andersen ZJ, Atkinson RW, Bauwelinck M, Bellander T, Brandt J, Brunekreef B, Cesaroni G, Chen J et al (2021) Long-term exposure to low-level ambient air pollution and incidence of stroke and coronary heart disease: a pooled analysis of six european cohorts within the elapse project. The Lancet Planetary Health 5(9):e620–e632
Article Google Scholar
Tainio M, Andersen ZJ, Nieuwenhuijsen MJ, Hu L, De Nazelle A, An R, Garcia LM, Goenka S, Zapata-Diomedi B, Bull F et al (2021) Air pollution, physical activity and health: A mapping review of the evidence. Environ Int 147:105954
Article Google Scholar
Hu F, Guo Y (2021) Health impacts of air pollution in china. Front Environ Sci & Eng 15:1–18
Article Google Scholar
de la Salud OM, Weltgesundheitsorganisation WH, Organization EC (2021) for Environment, WHO global air quality guidelines: particulate matter (PM2. 5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide (World Health Organization, 2021)
Berrocal VJ, Guan Y, Muyskens A, Wang H, Reich BJ, Mulholland JA, Chang HH (2020) A comparison of statistical and machine learning methods for creating national daily maps of ambient pm2. 5 concentration. Atmos Environ 222:117130
Chae S, Shin J, Kwon S, Lee S, Kang S, Lee D (2021) Pm10 and pm2.5 real-time prediction models using an interpolated convolutional neural network. Sci Rep 11(1):11952
Tzanis CG, Alimissis A, Koutsogiannis I (2021) Addressing missing environmental data via a machine learning scheme. Atmosphere 12(4):499
Article Google Scholar
Han P, Mei H, Liu D, Zeng N, Tang X, Wang Y, Pan Y (2021) Calibrations of low-cost air pollution monitoring sensors for co, no2, o3, and so2. Sensors 21(1):256
Article Google Scholar
Araujo LN, Belotti JT, Alves TA, de Souza Tadano Y, Siqueira H (2020) Ensemble method based on artificial neural networks to estimate air pollution health risks. Environ Model & Softw 123:104567
Article Google Scholar
Yan R, Liao J, Yang J, Sun W, Nong M, Li F (2021) Multi-hour and multi-site air quality index forecasting in beijing using cnn, lstm, cnn-lstm, and spatiotemporal clustering. Expert Syst Appl 169:114513
Article Google Scholar
Seng D, Zhang Q, Zhang X, Chen G, Chen X (2021) Spatiotemporal prediction of air quality based on lstm neural network. Alex Eng J 60(2)
Huang Y, Ying JJC, Tseng VS (2021) Spatio-attention embedded recurrent neural network for air quality prediction. Knowl-Based Syst 233:107416
Article Google Scholar
Thongthammachart T, Araki S, Shimadera H, Eto S, Matsuo T, Kondo A (2021) An integrated model combining random forests and wrf/cmaq model for high accuracy spatiotemporal pm2. 5 predictions in the kansai region of japan. Atmos Environ 262:118620
Appel KW, Bash JO, Fahey KM, Foley KM, Gilliam RC, Hogrefe C, Hutzell WT, Kang D, Mathur R, Murphy BN et al (2021) The community multiscale air quality (cmaq) model versions 5.3 and 5.3. 1: system updates and evaluation. Geosci Model Dev 14(5):2867–2897
Wang P, Wang P, Chen K, Du J, Zhang H (2022) Ground-level ozone simulation using ensemble wrf/chem predictions over the southeast united states. Chemosphere 287:132428
Article Google Scholar
Wang T, Li J, Pan J, Ji D, Kim Y, Wu L, Wang X, Pan X, Sun Y, Wang Z et al (2022) An integrated air quality modeling system coupling regional-urban and street models in beijing. Urban Climate 43:101143
Article Google Scholar
Kong L, Tang X, Zhu J, Wang Z, Li J, Wu H, Wu Q, Chen H, Zhu L, Wang W et al (2021) A 6-year-long (2013–2018) high-resolution air quality reanalysis dataset in china based on the assimilation of surface observations from cnemc. Earth Syst Sci Data 13(2):529–570
Article Google Scholar
Baklanov A, Zhang Y (2020) Advances in air quality modeling and forecasting. Global Trans 2:261–270
Article Google Scholar
Liu DR, Hsu YK, Chen HY, Jau HJ (2021) Air pollution prediction based on factory-aware attentional lstm neural network. Comput 103:75–98
Article Google Scholar
Maleki H, Sorooshian A, Goudarzi G, Baboli Z, Tahmasebi Birgani Y, Rahmati M (2019) Air pollution prediction by using an artificial neural network model. Clean Techn Environ Policy 21:1341–1352
Article Google Scholar
Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley
Book Google Scholar
Du S, Li T, Yang Y, Horng SJ (2019) Deep air quality forecasting using hybrid deep learning framework. IEEE Trans Knowl Data Eng 33(6):2412–2424
Article Google Scholar
Pak U, Ma J, Ryu U, Ryom K, Juhyok U, Pak K, Pak C (2020) Deep learning-based pm2. 5 prediction considering the spatiotemporal correlations: A case study of beijing, china. Sci Total Environ 699:133561
Zhang B, Zhang H, Zhao G, Lian J (2020) Constructing a pm2. 5 concentration prediction model by combining auto-encoder with bi-lstm neural networks. Environ Model & Softw 124:104600
Arsov M, Zdravevski E, Lameski P, Corizzo R, Koteli N, Gramatikov S, Mitreski K, Trajkovik V (2021) Multi-horizon air pollution forecasting with deep neural networks. Sensors 21(4):1235
Article Google Scholar
Van Roode S, Ruiz-Aguilar J, González-Enrique J, Turias I (2019) An artificial neural network ensemble approach to generate air pollution maps. Environ Environ Monit Assess 191:1–15
Google Scholar
Zhang B, Zou G, Qin D, Lu Y, Jin Y, Wang H (2021) A novel encoder-decoder model based on read-first lstm for air pollutant prediction. Sci Total Environ 765:144507
Article Google Scholar
Cordova CH, Portocarrero MNL, Salas R, Torres R, Rodrigues PC, López-Gonzales PC (2021) Air quality assessment and pollution forecasting using artificial neural networks in metropolitan lima-peru. Sci Rep 11(1):24232
Article Google Scholar
Zhu J, Deng F, Zhao J, Zheng H (2021) Attention-based parallel networks (apnet) for pm2. 5 spatiotemporal prediction. Sci Total Environ 769:145082
Sayeed A, Choi Y, Eslami E, Lops Y, Roy A, Jung J (2020) Using a deep convolutional neural network to predict 2017 ozone concentrations, 24 hours in advance. Neural Netw 121:396–408
Article Google Scholar
Gómez-Losada Á, Santos FM, Gibert K, Pires JC (2019) A data science approach for spatiotemporal modelling of low and resident air pollution in madrid (spain): Implications for epidemiological studies. Comput Environ Urban Syst 75:1–11
Article Google Scholar
Linares C, Díaz J, Negev M, Martínez GS, Debono R, Paz S (2020) Impacts of climate change on the public health of the mediterranean basin population-current situation, projections, preparedness and adaptation. Environ Res 182:109107
Article Google Scholar
Laña I, Del Ser J, Padró A, Vélez M, Casanova-Mateo C (2016) The role of local urban traffic and meteorological conditions in air pollution: A data-based case study in madrid, spain. Atmos Environ 145:424–438
Article Google Scholar
Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley
Li J, Heap AD (2014) Spatial interpolation methods applied in the environmental sciences: A review. Environ Model & Softw 53:173–189
Article Google Scholar
Wackernagel H (2003) Multivariate geostatistics: an introduction with applications. Springer, Sci & Bus Media
Hu K, Sivaraman V, Bhrugubanda H, Kang S, Rahman A (2016) In 2016 IEEE SENSORS, IEEE, pp 1–3
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Article Google Scholar
Kohonen T (1988) An introduction to neural computing. Neural Netw 1(1):3–16
Article Google Scholar
Nocedal J, Wright SJ (1999) Numerical optimization. Springer
Book Google Scholar

Download references

Funding

Funding for open access publishing: Universidad de Sevilla/CBUA Funding for open access publishing: Universidad de Sevilla/CBUA. This work was supported by the Spanish grant (with support from the European Regional Development Fund) SANEVEC (TED2021-130825B-I00) and by AIRMOV (P009-21/E03).

Author information

Authors and Affiliations

Robotics and Technology of Computers Lab (RTC), ETSI Informática, Universidad de Sevilla, Avd. Reina Mercedes s/n, Sevilla, 41012, Spain
Elena Cerezuela-Escudero, Juan Manuel Montes-Sanchez, Juan Pedro Dominguez-Morales, Lourdes Duran-Lopez & Gabriel Jimenez-Moreno

Authors

Elena Cerezuela-Escudero
View author publications
You can also search for this author in PubMed Google Scholar
Juan Manuel Montes-Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Juan Pedro Dominguez-Morales
View author publications
You can also search for this author in PubMed Google Scholar
Lourdes Duran-Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Jimenez-Moreno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Cerezuela-Escudero.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cerezuela-Escudero, E., Montes-Sanchez, J.M., Dominguez-Morales, J.P. et al. A systematic comparison of different machine learning models for the spatial estimation of air pollution. Appl Intell 53, 29604–29619 (2023). https://doi.org/10.1007/s10489-023-05109-y

Download citation

Accepted: 12 October 2023
Published: 31 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05109-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.