Impulse Response Estimation of Linear Time-Invariant Systems Using Convolved Gaussian Processes and Laguerre Functions

Guarnizo, Cristian; Álvarez, Mauricio A.

doi:10.1007/978-3-319-75193-1_34

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10657))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2121 Accesses

Abstract

This paper presents a novel method to estimate the impulse response function of Linear Time-Invariant systems from input-output data by means of Laguerre functions and Convolved Gaussian Processes. We define a new non-stationary covariance function that encodes the convolution between the Laguerre functions and the input. The input (excitation) is modelled by a Gaussian Process prior. Thus, we are able to estimate the system’s impulse response by performing maximum likelihood estimation over the model hyperparameters. Besides, the proposed model performs well in missing and noisy data scenarios.

You have full access to this open access chapter, Download conference paper PDF

The Use of Gaussian Processes in System Identification

Use of Gaussian Processes in System Identification

Linear Systems: Continuous-Time Impulse Response Descriptions

Keywords

1 Introduction

Ordinary differential equations (ODEs) are used to describe Linear Time-Invariant (LTI) systems in many fields, such as, Biology, Engineering, Economics, among others [3]. Generally, an ODE can be represented as

$$\begin{aligned} b_p\frac{\text {d}^p f(t)}{\text {d}t^p} + \ldots + b_1\frac{\text {d} f(t)}{\text {d}t} + b_0 f(t) = u(t), \end{aligned}$$

(1)

where p is the order of the differential equation, $b_i$ (with $i = \{0, 1,\ldots ,\ p \}$) is a coefficient that weights the i-th derivative of f(t) w.r.t. t, u(t) is the forcing function and f(t) is the solution function. Equation (1) can be rewritten as the following convolution,

$$\begin{aligned} f(t) = \int _0^t h(\tau )u(t-\tau )\text {d}\tau , \end{aligned}$$

(2)

where h(t) is function of all the $b_i$’s and is also known as the impulse response function (IRF).

In [2] is introduced a Gaussian Process (GP) model where the covariance function of the output is build from the convolution of a smoothing kernel and the covariance of a white noise process. Later, this model is extended in Latent Force models (LFMs) [1] by replacing the smoothing kernel with the system’s IRF. Thus, the IRF is encoded within the covariance function in the LFM framework. Besides, LFMs are an interesting tool given that we are able to make system identification by learning the model from the available output data.

In many system identification applications, the order the ODE and its parametrization is unknown. This problem have been addressed by approximating the IRF using Orthogonal Basis Functions (OBFs) [8, 10]. Laguerre functions are a type of OBFs that are characterized by having only one pole or parameter that controls the waveform of the basis functions. Furthermore, Laguerre functions have been used in system identification tasks, as in [5, 12].

In this paper, we propose to estimate the IRF using the Laguerre functions encoded in the covariance function of a Convolved Gaussian Process (CGP). Laguerre parameters are learned during the maximization of the CGP marginal likelihood function.

This paper is organized as follows. In Sect. 2, a brief overview of CGP theory is given, with the mathematical construction of all covariance functions required to perform system identification. Then, in Sect. 3 the Laguerre functions are introduced. Using the theory developed in previous section, the proposed model is presented in Sect. 4. Some results showing the ability of the model to estimate the IRF under different scenarios are explored in Sect. 5. Finally, a discussion about similar approaches is given in Sect. 6.

2 Convolved Gaussian Processes

In this section an overview of the mathematical foundation for CGPs is given. We start by considering a multi-output multi-input (MIMO) dynamical system with D outputs and Q inputs. In this system, the d-th output is described by

$$\begin{aligned} y_d(t) = f_d(t) + \epsilon _d(t), \end{aligned}$$

(3)

with $\epsilon _d(t) \sim \mathcal {N}(0, \sigma _d^2)$ and

$$\begin{aligned} f_{d}(t) = \int _0^t h_d(t-\tau ) \sum _{q=1}^{Q} u_q(\tau )\text {d}\tau , \end{aligned}$$

(4)

where $h_d(t)$ is a smoothing kernel which corresponds to the system’s IRF [1]. The set of inputs $\{u_q(z)\}_{q=1}^{Q}$ are independent GPs, and each one is defined as

$$\begin{aligned} u_q \sim \mathcal {GP}(0, k_{u_q,u_q}(z,z')). \end{aligned}$$

(5)

where $k_{u_q,u_q}(t,t')$ is the covariance function used to model the input functions. As in [1] we adopted the square exponential covariance function defined as follows

$$ k_{u_q,u_q}(z,z') = \exp \left( -\frac{(z-z')^2}{l_q^2} \right) , $$

where $l_q$ is the lengthscale associated to the q-th input. From Eqs. (3), (4) and (5), we are able to model the set of outputs $\{y_d(t)\}_{d=1}^{D}$ as a GP with zero mean and covariance function given by

$$\begin{aligned} k_{y_d, y_{d'}}(t,t') = k_{f_d, f_{d'}}(t,t') + \sigma _d^2 \delta _{d,d'}\delta _{t,t'}, \end{aligned}$$

with

$$ k_{f_d, f_{d'}}(t,t') = \sum _{q=1}^{Q} k^{(q)}_{f_d, f_{d'}}(t,t'), $$

where $k^{(q)}_{f_d, f_{d'}}(t,t')$ is defined as

$$ \int _0^t h_d(t-\tau ) \int _0^{t'} h_{d'}(t'-\tau ') k_{u_q,u_q} (\tau , \tau ') \text {d}\tau ' \text {d}\tau . $$

Additionally, we require to define the cross-covariance between outputs and inputs which is defined as

$$ k_{f_d,u_q}(t,t') = \int _0^t h_d(t-\tau ) k_{u_q,u_q} (\tau , t') \text {d}\tau . $$

Covariance function $k_{f_d,u_q}(t,t')$ allows us to make predictions from the output domain to the input domain, and vice versa. This property is exploited in the proposed model defined in Sect. 4.

3 Laguerre Functions

Laguerre functions form a complete and orthonormal set of basis, where each function is defined as [5]

$$ l_m(t) = \mathcal {L}_m(t) \exp (-\gamma t), $$

where $\gamma $ is a free parameter known as Laguerre scale [4], and

$$ \mathcal {L}_m(t) = \sqrt{2 \gamma }\sum _{k=0}^{m} \frac{(-1)^{k}m!2^{m-k}}{k![(mk)!]^2}(2 \gamma t)^{m-k}. $$

The m-th degree polynomial $\mathcal {L}_m(t)$ is called the m-th Laguerre polynomial. It is also interesting to note that the Laguerre function $l_m(t)$ has m zero crossings defined by the zeros of $\mathcal {L}_m(t)$. Additionally, each Laguerre function represents a dynamical system characterized by one pole with multiplicity, e.g. the Laplace transform of the m-th Laguerre function is given by

$$ L_m(s) = \sqrt{2\gamma } \frac{(\gamma - s)^m}{(\gamma + s)^{m+1}}, $$

where the Laguerre scale parameter $\gamma $ controls the position of the pole and the zeros of each Laguerre function.

4 Proposed Method

In LFMs, the IRF is assumed to be known before hand, this allows to the CGP to predict the input values solely from the output data. Here, we are interested in approximating the IRF by means of Laguerre functions. In order to do so, we require to make use of input-output data. Thus, the IRF of the d-th output is approximated by the Laguerre functions as

$$\begin{aligned} \hat{h}_d(t) = \sum _{m=0}^{M}c_{d,m} l_{d,m}(t), \end{aligned}$$

(6)

where $c_{d,m}$ weights the m-th Laguerre function of the d-th output. Then (4) becomes

$$\begin{aligned} f_{d}(t) = \sum _{m=0}^{M}c_{d,m} \int _0^t l_m(t-\tau ) \sum _{q=1}^{Q} u_q(\tau )\text {d}\tau . \end{aligned}$$

(7)

Next, we proceed to define the covariance functions for the new model described in (7). Thus, the covariance function $k^{(q)}_{f_d, f_{d'}}(t,t')$ becomes

$$ \int _0^t \hat{h}_d(t-\tau ) \int _0^{t'} \hat{h}_{d'}(t'-\tau ') k_{u_q,u_q} (\tau , \tau ') \text {d}\tau ' \text {d}\tau , $$

and $k_{f_d,u_q}(t,t')$ is rewritten as

$$ \sum _{m=0}^{M}c_{d,m} \int _0^t l_{d,m}(t-\tau ) k_{u_q,u_q} (\tau , t') \text {d}\tau . $$

Note that, if $m>0$, then the convolutions for the above covariance functions have no closed form. Consequently, we approximate the convolutions by using discrete sums as in [6]. This increases the computation time for the evaluation of the covariance function, but it also allows the model to use any covariance function for the inputs.

4.1 Hyperparameter Learning

Let us assume we are given observations of Q-inputs and D-outputs in vectors $\{\mathbf {u}_q\}_{q=1}^{Q}$ and $\{\mathbf {y}_d\}_{d=1}^D$, respectively. Thus we condition the proposed GP model on this finite set of observations, the model becomes into the following multivariate normal distribution

$$ {\mathbf {g}}= \left[ \begin{array}{c} \mathbf {y}\\ \mathbf {u}\end{array} \right] \sim \mathcal {N}\left( \varvec{0}, {\mathbf {K}}_{{\mathbf {g}}} \right) , $$

where ${\mathbf {g}}$ is obtained by stacking in one column the vectors $\mathbf {y}$ and $\mathbf {u}$. Furthermore, the covariance function for ${\mathbf {g}}$ is defined as follows,

$$\begin{aligned} {\mathbf {K}}_{{\mathbf {g}}} = \left[ \begin{array}{cc} {\mathbf {K}}_{\mathbf {y}, \mathbf {y}} &{} {\mathbf {K}}_{\mathbf {f},\mathbf {u}} \\ {\mathbf {K}}_{\mathbf {u}, \mathbf {f}} &{} {\mathbf {K}}_{\mathbf {u}, \mathbf {u}} \end{array} \right] , \end{aligned}$$

(8)

where each ${\mathbf {K}}_{i,j}$ (i or j can be either y, f or u) is obtained by evaluating the covariance function $k_{i,j}(t,t')$ at the time inputs $\mathbf {t}$ associated to the observations. From the above definitions, the set of parameters $\theta = \{l_q, \sigma ^2_d, \alpha _d, c_{d,m} \}$ (where m indexes the Laguerre functions as described in (6)) are learned by maximizing the logarithm of the marginal likelihood, which is given by

$$ \log p({\mathbf {g}}|\mathbf {t}) = -\frac{1}{2}{\mathbf {g}}^{\top } {\mathbf {K}}_{\mathbf {g}}^{-1}{\mathbf {g}}- \frac{1}{2}\log |{\mathbf {K}}_{\mathbf {g}}| - \frac{N}{2}\log (2\pi ) $$

where N is the total number of data points [7].

5 Results

In this section, two different numerical problems are given to illustrate the properties of the proposed model.

5.1 Approximation of the Impulse Response

In this experiment we show the ability of the proposed model to recover the impulse response function by means of the Laguerre functions that are encoded within the covariance function. We first generate 50 data points equally spaced along the range [0, 4] s, by sampling $\mathbf {u}$ from the GP defined in (5) with $lq = 0.8$. Then, the output data $\mathbf {f}$ is obtained by applying $\mathbf {u}$ through a second order dynamical system characterized by the following IRF

$$ h(t) = \frac{1}{\omega }\exp \left( -\frac{b_1 t}{2} \right) \sinh (\omega t), $$

with $b_0 = 1$, $b_1 = 4$ and $\omega = \sqrt{b_1^2- 4b_0}/2$. By stacking data vectors $\mathbf {f}$ and $\mathbf {u}$ we proceed to learn a model with $M=10$ Laguerre basis by using the procedure described in Sect. 4.1. The model is learned 10 times with different initializations for the parameters $\theta $. In Fig. 1 is shown the mean and two standard deviations calculated from the 10 IRFs learned by the proposed model. The mean function fits well the true IRF, but the standard deviation indicates that some of the obtained approximations tend to be less accurate at the end of the available data. This effect is produced when using high order Laguerre functions to fit smooth IRFs. As discussed in [5], this problem can be addressed by forcing the higher order weights $c_{d,m}$ to be zero. In other words, overfitting might be reduced by setting to zero the coefficients of Laguerre functions related to high frequency components.

5.2 Prediction of Missing Input/Output Values

For this experiment, we are interested in predicting missing values of inputs and outputs. First, we configured a MIMO system with 2 inputs and 2 outputs described by the following equations

$$\begin{aligned} \frac{\text {d}^4 f_1(t)}{\text {d}t^4} + 4\frac{\text {d}^3 f_1(t)}{\text {d}t^3} + 9\frac{\text {d}^2 f_1(t)}{\text {d}t^2} + 14\frac{\text {d} f_1(t)}{\text {d}t} + 8 f_1(t) = \bar{u}(t), \end{aligned}$$

$$\begin{aligned} \frac{\text {d} f_2(t)}{\text {d}t} + f_2(t) = \bar{u}(t), \end{aligned}$$

with $\bar{u}(t) = \sum _{q=1}^{2}u_q(t)$. Then, 50 data points are generated for inputs and outputs variables (we follow similar steps as the ones described in Sect. 5.1). Additionally, output data is corrupted by an additive white noise process as shown in (3). In order to show the ability of the proposed model to deal with missing data, the dataset is divided into 2 subsets (training and testing) as shown in Fig. 2.

Output 2 missing data is predicted by the data from output 1 and the inputs, while input 1 missing data is predicted only from the outputs (regarding the covariance function defined in (8)). In Fig. 3 is shown the IRFs estimated for both outputs. The estimated IRF for output 1 is close enough, but the estimation of the IRF for output 2 is clearly affected by the missing data. Nevertheless, predictions for the testing data are well fitted.

6 Discussion and Future Work

Our experiments demonstrated that the proposed model is able to estimate the continuous-time IRF of LTI systems in the presence of noise corruption and missing input and output data. The IRF can be approximated in a non-parametric manner by placing a GP prior over this function. In [11] the Gaussian Process Convolution Model (GPCM) is introduced. The GPCM is described as a continuous-time non-parametric window moving average process. One advantage of this model is that it can be considered in the frequency domain as well. Another non-parametric approximation is proposed in [9], where a specific covariance matrix is built by using a stable spline kernel.

The above mentioned methods require to approximate the posterior distribution of the IRF by using special algorithms, i.e. the GPCM requires a variational inference approach, meanwhile the work in [9] an Expectation-Maximization algorithm is adopted. In contrast, for the model proposed here, the IRF is estimated using a set of orthogonal basis (Laguerre functions) which included a dynamical representation. Additionally, the proposed model is learned using the standard GP training method [7].

This model can be easily extended by using another set of orthonormal basis. For example, selecting a set of basis which allows to find a closed form for the convolutions is recommended.

References

Álvarez, M.A., Luengo, D., Lawrence, N.D.: Linear latent force models using Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2693–2705 (2013)
Article Google Scholar
Boyle, P., Frean, M.: Dependent Gaussian processes. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 217–224. MIT Press (2005). http://papers.nips.cc/paper/2561-dependent-gaussian-processes.pdf
Dass, S.C., Lee, J., Lee, K., Park, J.: Laplace based approximate posterior inference for differential equation models. Stat. Comput. 27(3), 679–698 (2017). https://doi.org/10.1007/s11222-016-9647-0
Article MathSciNet MATH Google Scholar
Haber, R., Keviczky, L.: Nonlinear System Identification - Input-Output Modeling Approach. Kluwer Academic, Dordrecht (1999)
Book MATH Google Scholar
Israelsen, B.W., Smith, D.A.: Generalized laguerre reduction of the volterra kernel for practical identification of nonlinear dynamic systems. CoRR abs/1410.0741 (2014). http://arxiv.org/abs/1410.0741
Lawrence, N.D., Sanguinetti, G., Rattray, M.: Modelling transcriptional regulation using Gaussian processes. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 785–792. MIT Press (2007). http://papers.nips.cc/paper/3119-modelling-transcriptional-regulation-using-gaussian-processes.pdf
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Reginato, B.C., Oliveira, G.H.C.: On selecting the MIMO generalized orthonormal basis functions poles by using particle swarm optimization. In: 2007 European Control Conference (ECC), pp. 5182–5188, July 2007
Google Scholar
Risuleo, R.S., Bottegal, G., Hjalmarsson, H.: Kernel-based system identification from noisy and incomplete input-output data. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 2061–2066, December 2016
Google Scholar
Stanislawski, R., Hunek, W.P., Latawiec, K.J.: Modeling of nonlinear block-oriented systems using orthonormal basis and radial basis functions. In: International Conference on Systems Engineering, pp. 55–58 (2008)
Google Scholar
Tobar, F., Bui, T.D., Turner, R.E.: Learning stationary time series using Gaussian processes with nonparametric kernels. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3501–3509. Curran Associates, Inc. (2015)
Google Scholar
Wahlberg, B.: System identification using Laguerre models. IEEE Trans. Autom. Control 36(5), 551–562 (1991)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Authors would like to thank Convocatoria 567 from the Administrative Department of Science, Technology and Innovation of Colombia (COLCIENCIAS) for the support and funding of this work.

Author information

Authors and Affiliations

Engineering PhD Program, Universidad Tecnológica de Pereira, Pereira, Colombia
Cristian Guarnizo
Department of Computer Science, The University of Sheffield, Sheffield, UK
Mauricio A. Álvarez

Authors

Cristian Guarnizo
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio A. Álvarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristian Guarnizo .

Editor information

Editors and Affiliations

Universidad Federico Santa María, Santiago, Chile
Marcelo Mendoza
Carlos III University of Madrid, Madrid, Spain
Sergio Velastín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guarnizo, C., Álvarez, M.A. (2018). Impulse Response Estimation of Linear Time-Invariant Systems Using Convolved Gaussian Processes and Laguerre Functions. In: Mendoza, M., Velastín, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science(), vol 10657. Springer, Cham. https://doi.org/10.1007/978-3-319-75193-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-75193-1_34
Published: 04 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75192-4
Online ISBN: 978-3-319-75193-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)