Keywords

1 Introduction

Ordinary differential equations (ODEs) are used to describe Linear Time-Invariant (LTI) systems in many fields, such as, Biology, Engineering, Economics, among others [3]. Generally, an ODE can be represented as

$$\begin{aligned} b_p\frac{\text {d}^p f(t)}{\text {d}t^p} + \ldots + b_1\frac{\text {d} f(t)}{\text {d}t} + b_0 f(t) = u(t), \end{aligned}$$
(1)

where p is the order of the differential equation, \(b_i\) (with \(i = \{0, 1,\ldots ,\ p \}\)) is a coefficient that weights the i-th derivative of f(t) w.r.t. t, u(t) is the forcing function and f(t) is the solution function. Equation (1) can be rewritten as the following convolution,

$$\begin{aligned} f(t) = \int _0^t h(\tau )u(t-\tau )\text {d}\tau , \end{aligned}$$
(2)

where h(t) is function of all the \(b_i\)’s and is also known as the impulse response function (IRF).

In [2] is introduced a Gaussian Process (GP) model where the covariance function of the output is build from the convolution of a smoothing kernel and the covariance of a white noise process. Later, this model is extended in Latent Force models (LFMs) [1] by replacing the smoothing kernel with the system’s IRF. Thus, the IRF is encoded within the covariance function in the LFM framework. Besides, LFMs are an interesting tool given that we are able to make system identification by learning the model from the available output data.

In many system identification applications, the order the ODE and its parametrization is unknown. This problem have been addressed by approximating the IRF using Orthogonal Basis Functions (OBFs) [8, 10]. Laguerre functions are a type of OBFs that are characterized by having only one pole or parameter that controls the waveform of the basis functions. Furthermore, Laguerre functions have been used in system identification tasks, as in [5, 12].

In this paper, we propose to estimate the IRF using the Laguerre functions encoded in the covariance function of a Convolved Gaussian Process (CGP). Laguerre parameters are learned during the maximization of the CGP marginal likelihood function.

This paper is organized as follows. In Sect. 2, a brief overview of CGP theory is given, with the mathematical construction of all covariance functions required to perform system identification. Then, in Sect. 3 the Laguerre functions are introduced. Using the theory developed in previous section, the proposed model is presented in Sect. 4. Some results showing the ability of the model to estimate the IRF under different scenarios are explored in Sect. 5. Finally, a discussion about similar approaches is given in Sect. 6.

2 Convolved Gaussian Processes

In this section an overview of the mathematical foundation for CGPs is given. We start by considering a multi-output multi-input (MIMO) dynamical system with D outputs and Q inputs. In this system, the d-th output is described by

$$\begin{aligned} y_d(t) = f_d(t) + \epsilon _d(t), \end{aligned}$$
(3)

with \(\epsilon _d(t) \sim \mathcal {N}(0, \sigma _d^2)\) and

$$\begin{aligned} f_{d}(t) = \int _0^t h_d(t-\tau ) \sum _{q=1}^{Q} u_q(\tau )\text {d}\tau , \end{aligned}$$
(4)

where \(h_d(t)\) is a smoothing kernel which corresponds to the system’s IRF [1]. The set of inputs \(\{u_q(z)\}_{q=1}^{Q}\) are independent GPs, and each one is defined as

$$\begin{aligned} u_q \sim \mathcal {GP}(0, k_{u_q,u_q}(z,z')). \end{aligned}$$
(5)

where \(k_{u_q,u_q}(t,t')\) is the covariance function used to model the input functions. As in [1] we adopted the square exponential covariance function defined as follows

$$ k_{u_q,u_q}(z,z') = \exp \left( -\frac{(z-z')^2}{l_q^2} \right) , $$

where \(l_q\) is the lengthscale associated to the q-th input. From Eqs. (3), (4) and (5), we are able to model the set of outputs \(\{y_d(t)\}_{d=1}^{D}\) as a GP with zero mean and covariance function given by

$$\begin{aligned} k_{y_d, y_{d'}}(t,t') = k_{f_d, f_{d'}}(t,t') + \sigma _d^2 \delta _{d,d'}\delta _{t,t'}, \end{aligned}$$

with

$$ k_{f_d, f_{d'}}(t,t') = \sum _{q=1}^{Q} k^{(q)}_{f_d, f_{d'}}(t,t'), $$

where \(k^{(q)}_{f_d, f_{d'}}(t,t')\) is defined as

$$ \int _0^t h_d(t-\tau ) \int _0^{t'} h_{d'}(t'-\tau ') k_{u_q,u_q} (\tau , \tau ') \text {d}\tau ' \text {d}\tau . $$

Additionally, we require to define the cross-covariance between outputs and inputs which is defined as

$$ k_{f_d,u_q}(t,t') = \int _0^t h_d(t-\tau ) k_{u_q,u_q} (\tau , t') \text {d}\tau . $$

Covariance function \(k_{f_d,u_q}(t,t')\) allows us to make predictions from the output domain to the input domain, and vice versa. This property is exploited in the proposed model defined in Sect. 4.

3 Laguerre Functions

Laguerre functions form a complete and orthonormal set of basis, where each function is defined as [5]

$$ l_m(t) = \mathcal {L}_m(t) \exp (-\gamma t), $$

where \(\gamma \) is a free parameter known as Laguerre scale [4], and

$$ \mathcal {L}_m(t) = \sqrt{2 \gamma }\sum _{k=0}^{m} \frac{(-1)^{k}m!2^{m-k}}{k![(mk)!]^2}(2 \gamma t)^{m-k}. $$

The m-th degree polynomial \(\mathcal {L}_m(t)\) is called the m-th Laguerre polynomial. It is also interesting to note that the Laguerre function \(l_m(t)\) has m zero crossings defined by the zeros of \(\mathcal {L}_m(t)\). Additionally, each Laguerre function represents a dynamical system characterized by one pole with multiplicity, e.g. the Laplace transform of the m-th Laguerre function is given by

$$ L_m(s) = \sqrt{2\gamma } \frac{(\gamma - s)^m}{(\gamma + s)^{m+1}}, $$

where the Laguerre scale parameter \(\gamma \) controls the position of the pole and the zeros of each Laguerre function.

4 Proposed Method

In LFMs, the IRF is assumed to be known before hand, this allows to the CGP to predict the input values solely from the output data. Here, we are interested in approximating the IRF by means of Laguerre functions. In order to do so, we require to make use of input-output data. Thus, the IRF of the d-th output is approximated by the Laguerre functions as

$$\begin{aligned} \hat{h}_d(t) = \sum _{m=0}^{M}c_{d,m} l_{d,m}(t), \end{aligned}$$
(6)

where \(c_{d,m}\) weights the m-th Laguerre function of the d-th output. Then (4) becomes

$$\begin{aligned} f_{d}(t) = \sum _{m=0}^{M}c_{d,m} \int _0^t l_m(t-\tau ) \sum _{q=1}^{Q} u_q(\tau )\text {d}\tau . \end{aligned}$$
(7)

Next, we proceed to define the covariance functions for the new model described in (7). Thus, the covariance function \(k^{(q)}_{f_d, f_{d'}}(t,t')\) becomes

$$ \int _0^t \hat{h}_d(t-\tau ) \int _0^{t'} \hat{h}_{d'}(t'-\tau ') k_{u_q,u_q} (\tau , \tau ') \text {d}\tau ' \text {d}\tau , $$

and \(k_{f_d,u_q}(t,t')\) is rewritten as

$$ \sum _{m=0}^{M}c_{d,m} \int _0^t l_{d,m}(t-\tau ) k_{u_q,u_q} (\tau , t') \text {d}\tau . $$

Note that, if \(m>0\), then the convolutions for the above covariance functions have no closed form. Consequently, we approximate the convolutions by using discrete sums as in [6]. This increases the computation time for the evaluation of the covariance function, but it also allows the model to use any covariance function for the inputs.

4.1 Hyperparameter Learning

Let us assume we are given observations of Q-inputs and D-outputs in vectors \(\{\mathbf {u}_q\}_{q=1}^{Q}\) and \(\{\mathbf {y}_d\}_{d=1}^D\), respectively. Thus we condition the proposed GP model on this finite set of observations, the model becomes into the following multivariate normal distribution

$$ {\mathbf {g}}= \left[ \begin{array}{c} \mathbf {y}\\ \mathbf {u}\end{array} \right] \sim \mathcal {N}\left( \varvec{0}, {\mathbf {K}}_{{\mathbf {g}}} \right) , $$

where \({\mathbf {g}}\) is obtained by stacking in one column the vectors \(\mathbf {y}\) and \(\mathbf {u}\). Furthermore, the covariance function for \({\mathbf {g}}\) is defined as follows,

$$\begin{aligned} {\mathbf {K}}_{{\mathbf {g}}} = \left[ \begin{array}{cc} {\mathbf {K}}_{\mathbf {y}, \mathbf {y}} &{} {\mathbf {K}}_{\mathbf {f},\mathbf {u}} \\ {\mathbf {K}}_{\mathbf {u}, \mathbf {f}} &{} {\mathbf {K}}_{\mathbf {u}, \mathbf {u}} \end{array} \right] , \end{aligned}$$
(8)

where each \({\mathbf {K}}_{i,j}\) (i or j can be either y, f or u) is obtained by evaluating the covariance function \(k_{i,j}(t,t')\) at the time inputs \(\mathbf {t}\) associated to the observations. From the above definitions, the set of parameters \(\theta = \{l_q, \sigma ^2_d, \alpha _d, c_{d,m} \}\) (where m indexes the Laguerre functions as described in (6)) are learned by maximizing the logarithm of the marginal likelihood, which is given by

$$ \log p({\mathbf {g}}|\mathbf {t}) = -\frac{1}{2}{\mathbf {g}}^{\top } {\mathbf {K}}_{\mathbf {g}}^{-1}{\mathbf {g}}- \frac{1}{2}\log |{\mathbf {K}}_{\mathbf {g}}| - \frac{N}{2}\log (2\pi ) $$

where N is the total number of data points [7].

5 Results

In this section, two different numerical problems are given to illustrate the properties of the proposed model.

5.1 Approximation of the Impulse Response

In this experiment we show the ability of the proposed model to recover the impulse response function by means of the Laguerre functions that are encoded within the covariance function. We first generate 50 data points equally spaced along the range [0, 4] s, by sampling \(\mathbf {u}\) from the GP defined in (5) with \(lq = 0.8\). Then, the output data \(\mathbf {f}\) is obtained by applying \(\mathbf {u}\) through a second order dynamical system characterized by the following IRF

$$ h(t) = \frac{1}{\omega }\exp \left( -\frac{b_1 t}{2} \right) \sinh (\omega t), $$

with \(b_0 = 1\), \(b_1 = 4\) and \(\omega = \sqrt{b_1^2- 4b_0}/2\). By stacking data vectors \(\mathbf {f}\) and \(\mathbf {u}\) we proceed to learn a model with \(M=10\) Laguerre basis by using the procedure described in Sect. 4.1. The model is learned 10 times with different initializations for the parameters \(\theta \). In Fig. 1 is shown the mean and two standard deviations calculated from the 10 IRFs learned by the proposed model. The mean function fits well the true IRF, but the standard deviation indicates that some of the obtained approximations tend to be less accurate at the end of the available data. This effect is produced when using high order Laguerre functions to fit smooth IRFs. As discussed in [5], this problem can be addressed by forcing the higher order weights \(c_{d,m}\) to be zero. In other words, overfitting might be reduced by setting to zero the coefficients of Laguerre functions related to high frequency components.

Fig. 1.
figure 1

Mean and two standard deviations estimated from the impulse responses learned at example Sect. 5.1.

Fig. 2.
figure 2

Prediction of missing data (Test data) for the MIMO system described in Sect. 5.2

5.2 Prediction of Missing Input/Output Values

For this experiment, we are interested in predicting missing values of inputs and outputs. First, we configured a MIMO system with 2 inputs and 2 outputs described by the following equations

$$\begin{aligned} \frac{\text {d}^4 f_1(t)}{\text {d}t^4} + 4\frac{\text {d}^3 f_1(t)}{\text {d}t^3} + 9\frac{\text {d}^2 f_1(t)}{\text {d}t^2} + 14\frac{\text {d} f_1(t)}{\text {d}t} + 8 f_1(t) = \bar{u}(t), \end{aligned}$$
$$\begin{aligned} \frac{\text {d} f_2(t)}{\text {d}t} + f_2(t) = \bar{u}(t), \end{aligned}$$

with \(\bar{u}(t) = \sum _{q=1}^{2}u_q(t)\). Then, 50 data points are generated for inputs and outputs variables (we follow similar steps as the ones described in Sect. 5.1). Additionally, output data is corrupted by an additive white noise process as shown in (3). In order to show the ability of the proposed model to deal with missing data, the dataset is divided into 2 subsets (training and testing) as shown in Fig. 2.

Output 2 missing data is predicted by the data from output 1 and the inputs, while input 1 missing data is predicted only from the outputs (regarding the covariance function defined in (8)). In Fig. 3 is shown the IRFs estimated for both outputs. The estimated IRF for output 1 is close enough, but the estimation of the IRF for output 2 is clearly affected by the missing data. Nevertheless, predictions for the testing data are well fitted.

Fig. 3.
figure 3

Impulse response approximation for the MIMO system described in Sect. 5.2

6 Discussion and Future Work

Our experiments demonstrated that the proposed model is able to estimate the continuous-time IRF of LTI systems in the presence of noise corruption and missing input and output data. The IRF can be approximated in a non-parametric manner by placing a GP prior over this function. In [11] the Gaussian Process Convolution Model (GPCM) is introduced. The GPCM is described as a continuous-time non-parametric window moving average process. One advantage of this model is that it can be considered in the frequency domain as well. Another non-parametric approximation is proposed in [9], where a specific covariance matrix is built by using a stable spline kernel.

The above mentioned methods require to approximate the posterior distribution of the IRF by using special algorithms, i.e. the GPCM requires a variational inference approach, meanwhile the work in [9] an Expectation-Maximization algorithm is adopted. In contrast, for the model proposed here, the IRF is estimated using a set of orthogonal basis (Laguerre functions) which included a dynamical representation. Additionally, the proposed model is learned using the standard GP training method [7].

This model can be easily extended by using another set of orthonormal basis. For example, selecting a set of basis which allows to find a closed form for the convolutions is recommended.