1 Introduction

Spatio-temporal data are ubiquitous in the nature. The meteorological time series data, as collected from spatially distributed weather stations or from sensor networks, are one of the prominent examples in this respect. Prediction of meteorological time series is essential not only to anticipate the weather condition for taking adequate measures, but also it helps in proper management of energy [6]. However, the major challenge in meteorological time series prediction is the complex spatio-temporal inter-relationships among the variables. Modeling such space-time dependency becomes more complicated when the number of spatially distributed influencing variables becomes considerably large. For the same reason, the effectiveness of many of the existing space-time prediction models, especially those are based on graph-based approaches, are significantly hindered.

In the present work, we have proposed an improved, graph-based, probabilistic model, so as to handle such spatio-temporal prediction scenario. The approach is based on the spatial Bayesian network (SpaBN) [3], which can efficiently model the influence from large number of spatially distributed variables. Previously, SpaBN has shown encouraging performance in the domain of hydrology [3]. In the present work, we have applied SpaBN as the base technology behind our proposed space-time model for meteorological time series prediction.

1.1 Problem Statement and Contributions

The relevant prediction problem can be formally stated as follows:

  • Given, the historical daily time series data set over n meteorological variables in \(M = \left\{ m_1,m_2,\cdots ,m_n\right\} \) , corresponding to a set of l spatial locations \(L=\left\{ l_1,l_2,\cdots ,l_l\right\} \) for previous t years: \(\left\{ y_1,y_2,\cdots ,y_t\right\} \). The problem is to determine the daily times series of the variables in M, for any location \(x \in (L\cup Z)\), for future q years \(\left\{ y_{(t+1)},y_{(t+2)},...,y_{(t+q)}\right\} \), where, \((Z \cap L)=\phi \) and q is a positive integer.

In this context, spatio-temporal prediction of any meteorological variable \(m \in M\) needs to consider influences of its co-located variables in M, from several spatially distributed locations in L. Though the graphical models, like Bayesian networks, are highly suitable for modeling such influences [2], however, considering separate influencing nodes corresponding to each location \(l_i \in L\) makes the graphical structure as well as the analysis process extremely complex.

Present work attempts to address this issue by utilizing the effective modeling ability of the spatial Bayesian network (SpaBN). The major contributions in this regard are as follows:

  • Exploring the spatial Bayesian network (SpaBN) analysis in multivariate prediction of meteorological time series data;

  • Proposing SpaBN based spatio-temporal prediction model which is capable of efficiently modeling complex inter-variable dependency over space and time;

  • Validating the proposed space-time model with respect to prediction of daily temperature, humidity and precipitation rate around Kolkata, India;

  • Demonstrating the effectiveness of the proposed prediction model in comparison with benchmark and state-of-the-art space-time prediction techniques;

The remainder of the paper is organized as follows. The details of the proposed space-time model has been illustrated in Sect. 2. The results of empirical study have been reported in Sect. 3. Finally, we conclude in Sect. 4.

2 Proposed Space-Time Model

As depicted in the Fig. 1, the proposed prediction model comprises of three key steps: (i) data pre-processing, (ii) spatial weight calculation, and (iii) spatio-temporal prediction. Each of these steps are illustrated below:

Fig. 1.
figure 1

Framework of the proposed spatio-temporal prediction model

2.1 Data Pre-processing

The main objective of the data pre-processing step is to discretize the continuous meteorological variables, so as to make these suitable for the discrete Bayesian analysis of SpaBN in the subsequent steps. The discretization is performed by considering maximum and minimum observed value (\(max(m_i)\) and \(min(m_i)\)) in the historical time series for each continuous variable \(m_i \in M\), and then dividing the whole range into appropriate number (R) of bins or sub-ranges of desired size \(size(subRange)=\frac{max(m_i)- min(m_i)}{R}\). The value of R is determined empirically.

Moreover, in the pre-processing step we also optimize the size of training data set with consideration to the temporal variability of the meteorological variables within short period of time. For example, in general, rainfall shows monthly variation. Therefore, in order to make prediction for a particular day, the rainfall data of the associated month is considered rather than considering the rainfall data of the whole year.

2.2 Spatial Weight/Importance Calculation

This step aims at determining the spatial importance or spatial weight (\(SW_i\)) of each location \(l_i \in L\), with respect to the prediction location. The spatial weight \(SW_i\) is measured based on the spatial distance (\(SD_i\)) and the correlation between the time series of each variable in the neighborhood locations and that in the prediction location. Suppose, \(NCorr_{m_j}^i\) is the normalized correlation value between the time series of variable \(m_j\) in the i-th neighborhood location and that in the prediction location, such that \(NCorr_{m_j}^i \in [0,1]\). Then, the spatial weight of the location is determined as follows:

$$\begin{aligned} SW_i=\frac{\sum _{j=1}^{|M|}{NCorr_{m_j}^i}+NISD_i}{\sum _{k=1}^{|L|}(\sum _{j=1}^{|M|}{NCorr_{m_j}^k}+NISD_k)} \end{aligned}$$
(1)

where, \(NISD_i\) is the normalized inverse spatial distance of i-th location from the prediction location, such that \(NISD_i \in [0,1]\).

2.3 Spatio-Temporal Prediction

During the prediction process in the proposed space-time model, the effect of spatial influence of the meteorological variables are learnt with the help of SpaBN analysis. In order to describe the learning process, let’s consider an example scenario, where \(M_1, M_2\) and \(M_3\) are three arbitrary meteorological variables. Also let \(M_1\) is independent, \(M_2\) is influenced by \(M_1\), and \(M_3\) is influenced by \(M_1\) and \(M_2\). Now, because of the inherent spatio-temporal inter-relationships among these variables, variable at one location is also influenced by the variables in its neighborhood locations. Therefore, considering a causal dependency graph, comprising of the representative variables from all the neighboring locations, will ultimately lead to a complex graphical structure for capturing spatio-temporal inter-relationships among the variables (refer Fig. 2a).

In such scenario, spatial Bayesian network (SpaBN) can be an appropriate tool for modeling these spatio-temporal inter-relationships in an efficient manner. As shown in the SpaBN structure (refer Fig. 2b), all the standard/classical nodes associated with the same but spatially distributed variable have been replaced with composite nodes, denoted by double lined circles. The replacement of all such standard nodes with a single composite node reduces both the structural and the algorithmic complexity in Bayesian analysis to a great extent.

Fig. 2.
figure 2

Graph based modeling of spatio-temporal inter-relationships among the three meteorological variables: (a) a complex causal dependency graph of standard BN, (b) equivalent SpaBN structure [considering no. of locations \(|L|=6\)]

Let |L| be the number of neighboring locations considered. Then, according to the principle of SpaBN, the marginal and conditional probabilities of the variables in Fig. 2b are estimated in following fashion:

$$\begin{aligned} P(M_1)=\gamma \cdot \left[ \sum _{i=1}^{|L|} P(M_1^i)\cdot SW_i\right] \end{aligned}$$
(2)
$$\begin{aligned} P(M_2)=\gamma \cdot \left[ \sum _{i=1}^{|L|} P(M_2^i)\cdot SW_i\right] \end{aligned}$$
(3)
$$\begin{aligned} P(M_3)=\gamma \cdot \left[ \sum _{i=1}^{|L|} P(M_3^i)\cdot SW_i\right] \end{aligned}$$
(4)
$$\begin{aligned} P(M_2|M_1)=\gamma \cdot \left[ \sum _{i=1}^{|L|}{\frac{n(M_1^i,M_2^i)}{n(M_1^i)} \cdot SW_i}\right] \end{aligned}$$
(5)
$$\begin{aligned} P(M_3|M_1,M_2)=\gamma \cdot \left[ \sum _{i=1}^{|L|}{\frac{n(M_1^i,M_2^i,M_3^i)}{n(M_1^i,M_2^i)} \cdot SW_i}\right] \end{aligned}$$
(6)

where, \(SW_i\) is the spatial weight/importance of the i-th neighboring location with respect to the prediction location; \(\gamma \) is the normalization constant; and \(n(<\cdot >)\) is the count of observation for the variable value combination \(<\cdot>\).

Now, in order to capture the temporal evolution of the inter-variable dependencies, the SpaBN based learning process (as described above) is performed with historical data of each year separately, and finally the probabilistic estimates are combined in a weighted manner, so as to achieve the probability distributions corresponding to the inter-variable dependencies in the prediction year. Higher temporal weight (tw) is assigned to a year which is temporally nearer to the prediction year. The overall process of space-time learning is summarized through Algorithm 1, considering immediately next prediction year \(y_{(t+1)}\).

figure a

Once the parameter learning is over, the inference is generated as per SpaBN by utilizing the spatial weights (\(SW_i\)). For example, let the observed/ evidence variables are: \(M_1\) and \(M_2\), from which the value of \(M_3\) is to be inferred.

Then, as per the principle of SpaBN,

$$\begin{aligned} Inferred\,value\,of\,M_3 = \sum _{i=1}^{|L|} P(M_3^i|M_1^i,M_2^i)\cdot SW_i \end{aligned}$$
(7)

where the value for \(P(M_3^i|M_1^i,M_2^i)\) can be determined from the conditional probability table for the variable \(M_3\). Among these inferred values, the predicted value becomes the one corresponding to the maximum probability.

3 Experimental Evaluation

This section describes the empirical study carried out for evaluating our proposed space-time prediction model.

3.1 Data Set and Study Area

The proposed prediction model has been validated by forecasting daily time series of three primary meteorological variables, namely Temperature, relative Humidity, and Precipitation rate, around Kolkata [22.57 \(^\circ \) N, 88.36 \(^\circ \) E], India. The corresponding historical daily time series data have been collected from the FetchClimate Explorer [4] for a span of 10 years (2006–2015). Predictions have been carried out for two locations (Loc-1 [22.93\(^\circ \)N, 87.25\(^\circ \)E] and Loc-2 [22.82\(^\circ \)N, 88.29\(^\circ \)E]), for the year 2016 (refer Fig. 3).

Fig. 3.
figure 3

Study Area around Kolkata [22.57\(^\circ \)N, 88.36\(^\circ \)E] in West Bengal (India)

3.2 Results

The comparative study has been made with benchmark prediction techniques, like automated ARIMA (R-Tool), standard BN (SBN), ANN (MATLAB nnTool) etc. and state-of-the-art space-time models, namely hierarchical Bayesian auto-regressive (HBAR) model [5], and spatio-temporal ordinary Kriging (ST-OK) [1]. The prediction performance has been measured in terms of four popular statistical measures, namely normalized root mean square deviation (NRMSD) [3], Pearson’s correlation coefficient (CC), mean absolute error (MAE), and mean absolute percentage error (MAPE) [2]. The best-fit between the observed and predicted value yields NRMSD = 0, CC = 1, MAE = 0 and MAPE = 0. The results of prediction have been summarized in Tables 1, 2 and 3.

Table 1. Comparative study of Temperature prediction
Table 2. Comparative study of Relative humidity prediction
Table 3. Comparative study of Precipitation rate prediction

Discussions: On analyzing the results, presented in Tables 1, 2 and 3, the following inferences can be drawn:

  • In all the cases, the proposed prediction model produces the least NRMSDs and MAEs, which are significantly lesser than the other prediction techniques considered. This indicates superiority of our SpaBN based prediction, compared to the others.

  • The high values of CC (\({\approx }1\)) (refer Tables 1, 2 and 3) reveals that the series predicted by our model have the best match with the observed time series.

  • The least MAPE values corresponding to the proposed space-time prediction model also demonstrate its better efficacy in comparison with the others.

Moreover, the improvement in computation time of the proposed model, with respect to standard Bayesian network (SBN) based prediction, has also been studied considering number of neighboring location \(|L|=6\), and variable domain size \(R=3\). The result (refer Table 4) proves the effectiveness of using SpaBN for modeling spatio-temporal dependency among the spatially distributed meteorological variables, as accomplished by our proposed space-time prediction model.

Table 4. Comparative study of computation time (considering \(|L|=6\), \(R=3\), single prediction day, and SBN with spatially distributed variables)

4 Conclusions

The objective of the present work is to address the challenge of handling complex spatio-temporal dependency among the meteorological variables during multivariate time series prediction. For that purpose, we have proposed a space-time model based on spatial Bayesian network (SpaBN) which is inherently capable of efficiently modeling the inter-dependency among large number of spatially distributed variables. Experimental study has been carried out in comparison with several benchmarks (ARIMA, SBN, ANN) and state-of-the-art prediction techniques (HBAR, ST-OK). Overall, the proposed space-time model has shown encouraging performance with respect to both accuracy and computational cost in meteorological time series prediction.