On resampling and uncertainty estimation in Linear System Identification

doi:10.1016/j.automatica.2010.02.015

Automatica

Volume 46, Issue 5, May 2010, Pages 785-795

https://doi.org/10.1016/j.automatica.2010.02.015 Get rights and content

Abstract

Linear System Identification yields a nominal model parameter, which minimizes a specific criterion based on the single input–output data set. Here we investigate the utility of various methods for estimating the probability distribution of this nominal parameter using only the data from this single experiment. The results are compared to the actual parameter distribution generated by many Monte Carlo runs of the data-collection experiment. The methods considered are collectively known as resampling schemes, which include Subsampling, the Jackknife, and the Bootstrap. The broad aim is to generate an empirical parameter distribution function via the construction of a large number of new data records from the original single set of data, based on an assumption that this data is representative of all possible data, and then to run the parameter estimator on each of these new records to develop the distribution function. The performance of these schemes is evaluated on a difficult, almost unidentifiable system, and compared to the standard results based on asymptotic normality. In addition to the exploration of this example as a means to evaluate the strengths and weaknesses of these resampling schemes, some new theoretical results are proven and demonstrated for Subsampling schemes.

Introduction

Robust model-based control requires quantification of plant model uncertainty (Goodwin et al., 1999, Kosut et al., 1992, Ljung, 1999, Ninness and Goodwin, 1995). System identification methods can be ill-equipped to provide a measure of parameter uncertainty other than that based on asymptotic-in-data variance formulæ derived from the Central Limit Theory, which in turn is based on a Taylor expansion of the empirical identification cost function about the correct parameter value (Ljung, 1999, Söderström and Stoica, 1989). Recent studies (in under-excited systems, (Garatti et al., 2004, Garatti et al., 2006)) have shown that cases can be found where the cost function is non-convex and these can have separated local minima. In such cases, the uncertainty characterization from asymptotic theory can be misleading.

Here we seek to develop an approach to the empirical calculation of the underlying distribution function of the parameter estimate, which is equally valid when the cost function is non-convex and which, asymptotically as the number of data points tends to infinity, fully characterizes the finite data parameter distribution and, in the fixed-length case, yields a quantification of the error between the empirical distribution and the true underlying (and unknown) distribution. The approach is based on resampling ideas of the Bootstrap, the Jackknife, and Subsampling (Politis, 1998, Zoubir and Boashash, 1998). Our aim is to use the data to develop an approximation of the actual distribution function of the parameter estimate, based on the assumption that the data set is representative of the underlying stochastic processes.

We assume:

-
We have $N$ input–output pairs of data $X^{N} = {x_{i} = {[\begin{matrix} u_{i} & y_{i} \end{matrix}]}^{T}, i = 1, \dots, N}$ , where $u_{i}$ and $y_{i}$ are scalars.¹
-
These data are stationary and generated by a stable bivariate ARMA process, that is $A (z) [\begin{matrix} u_{t} \\ y_{t} \end{matrix}] = B (z) η_{t},$ where $A (z)$ and $B (z)$ are (2×2 and 2×1, respectively) polynomial matrices of the forward shift operator $z$ , and $η_{t}$ is a bivariate i.i.d process. The $(u_{t}, y_{t})$ process above encompasses open-loop as well as closed-loop configurations.
-
We seek to fit a fixed-order fixed-structure model parametrized by $θ$ to the $N$ -data set and to characterize the uncertainty in this parameter value. Specifically, we choose an empirical cost function $V (θ, X^{N}) = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i | i - 1} (θ))}^{2}$ , where ${\hat{y}}_{i | i - 1} (θ)$ is the optimal predictor based on the model corresponding to $θ$ . If the data set to which the cost function refers is clear from the context, we shall write $V_{N} (θ)$ in place of $V (θ, X^{N})$ . The minimizer of $V (θ, X^{N})$ (assuming it is unique) is indicated by ${\hat{θ}}_{N}$ . Our goal is to reconstruct its probability distribution, hereafter indicated by $F_{{\hat{θ}}_{N}} (θ)$ .

We present figures depicting distribution functions. To assist in the interpretation of these distribution functions, Fig. 1 shows the density and distribution functions for the

{\hat{θ}}_{N}^{1} = {\hat{a}}_{N}

component from an example to follow.

The paper is organized as follows. First, an example showing the limitations of the asymptotic theory of system identification is presented in Section 2. Then, some resampling strategies (namely; Monte Carlo, Subsampling, Model-based Jackknife, and Model-based Bootstrapping) are briefly recalled in Section 3, with particular emphasis on their application in the system identification setting. The analysis of resampling techniques is given in Sections 4 Analysis of the Subsampling method, 5 Analysis of Model-based Jackknife & Bootstrap, while Section 6 provides a comparison based on the same example where asymptotic theory performed poorly.

Section snippets

Asymptotic theory and its limitations — the SMS example

The following example is taken from Garatti et al. (2004), with its eponym created as an acronym of the authors’ first names. It shows a (somewhat contrived) situation where the blind use of the asymptotic theory of system identification as in Ljung (1999) and Söderström and Stoica (1989) leads to an unreliable estimate of uncertainty unless the number of data is exceedingly large.

Consider the following data-generating system: $y_{t} = \frac{b_{0} z^{- 1}}{1 + a_{0} z^{- 1}} u_{t} + (1 + h_{0} z^{- 1}) e_{t},$ where $θ_{0} = {[a_{0} b_{0} h_{0}]}^{T} = {[- 0.7 0.3 0.5]}^{T}$ and $e$

Resampling strategies

As shown in Section 2, there are cases where one cannot rely on the asymptotic theory of system identification for a reliable description of the probability distribution of the identified parameter vector with seemingly large values of $N$ . In particular, the SMS example reveals a circumstance where there are two closely competing but geometrically separated points $θ^{*}$ and $θ_{0}$ , and the asymptotic theory fails to reveal this dichotomy of solutions.

In order to provide a fair evaluation of

Analysis of the Subsampling method

In this section, we establish our main theoretical result concerning the consistency of the Subsampling procedure. To be precise, we will show that the probability distribution reconstructed via Subsampling, $F^{S S} (θ)$ , is a consistent estimate of the actual distribution of the parameter estimate identified with $N_{S}$ data points, $F_{{\hat{θ}}_{N_{S}}} (θ)$ . The proof relies on the fact that, in ARMAX processes, the dependence between data at two different time instants vanishes as the time lag between them increases,

Analysis of Model-based Jackknife & Bootstrap

As remarked earlier, given the similarity between Model-based Jackknife and Model-based Bootstrapping, we shall concentrate solely on analytical results for the latter.

Differently from Subsampling, the consistency of the Bootstrap procedure has been intensively studied during the last two decades, and many results are available in the literature (Shao & Tu, 1995). In particular, we have the following result from Bose (1988), which mirrors Theorem 1, Theorem 2 for Subsampling.

Theorem 3

Bose (1988), Theorem 3.9

Suppose that:

-
the

SMS example redux

Both Subsampling and the Jackknife/Bootstrapping have been applied to the SMS Example from Section 2 in order to reconstruct empirically the probability distribution of the identified model parameter ${\hat{θ}}_{N}$ , $N = 2000$ . In this section, some results which permit better understanding of Subsampling and Bootstrapping estimators’ performance are developed.

Conclusions

In this paper, we considered the problem of reconstructing the probability distribution of the identified model parameter ${\hat{θ}}_{N}$ based on a single finite-length data record. After showing that the heuristic use (with $N$ finite) of the classical asymptotic theory of system identification can be misleading, we introduced procedures based on resampling ideas and discussed their advantages and drawbacks. Theorems were developed on Subsampling and compared to the Bootstrap results. A somewhat

References (29)

S. Bittanti et al.
Bootstrap-based estimates of uncertainty in subspace identification methods
Automatica
(2000)
M.C. Campi et al.
Guaranteed non-asymptotic confidence regions in system identification
Automatica
(2005)
S. Garatti et al.
Assessing the quality of identified models through the asymptotic theory — when is the result reliable?
Automatica
(2004)
S. Garatti et al.
The asymptotic model quality assessment for instrumental variable identification revisited
Systems & Control Letters
(2006)
A. Mokkadem
Mixing properties of arma processes
Stochastic Processes and their Applications
(1988)
A. Bose
Edgeworth correction by Bootstrap in autoregressions
The Annals of Statistics
(1988)
D. Bosq
Non parametric statistics for stochastic processes
(1998)
Campi, M. C., & Weyer, E. (2006). Identification with finitely many data points: the lscr approach. In Proceedings of...
Y.A. Davydov
Convergence of distributions generated by stationary stochastic processes
Theory of Probability and its Applications
(1968)
Dunstan, W. J., & Bitmead, R. R. (2003). Empirical estimation of parameter distributions in system identification. In...

B. Efron

The jackknife, the bootstrap, and other resampling plans

(1982)

B. Efron

Computer-intensive methods in statistical regression

SIAM Review

(1988)

B. Efron et al.

An introduction to the bootstrap

(1993)

Goodwin, G. C., Gevers, M., & Ninness, B. (1999). Identification and robust control: bridging the gap. In Proceedings...

Cited by (9)

A counterexample to the uniqueness of the asymptotic estimate in ARMAX model identification via the correlation approach
2014, Systems and Control Letters
This paper deals with the identifiability of an ARMAX system when the correlation approach is adopted. In general, identifiability depends on both the parametrization of the model class and on the informativeness of the data. Here, we focus on the latter aspect and, therefore, a full-order model class is considered. The main goal is to provide a counterexample to the uniqueness of the asymptotic estimate when a persistently exciting input is adopted. This shows the somehow counterintuitive fact that the identifiability of ARMAX systems within the correlation approach is related to the “color” of the input.
Input design as a tool to improve the convergence of PEM
2013, Automatica
Citation Excerpt :
As the model order becomes larger, the trend to get trapped in local minima or to “converge” to the boundary of the search space, thus providing a useless model, seems to grow; we provide a couple of such examples in this paper. This problem has received renewed interest in the past few years, as pointed out in Ljung (2010), and different approaches have emerged to cope with it, such as the use of resampling schemes to the input–output data (Garatti & Bitmead, 2010) and the approximation of the process model by (potentially high order) linearly parametrized model structures to obtain convex cost functions (Grossmann, Jones, & Morari, 2009; Hjalmarsson, Welsh, & Rojas, 2012). In this work, we present a different approach to the convergence problem.
The Prediction Error Method (PEM) is related to an optimization problem built on input/output data collected from the system to be identified. It is often hard to find the global solution of this optimization problem because the corresponding objective function presents local minima and/or the search space is constrained to a nonconvex set. The shape of the cost function, and hence the difficulty in solving the optimization problem, depends directly on the experimental conditions, more specifically on the spectrum of the input/output data collected from the system. Therefore, it seems plausible to improve the convergence to the global minimum by properly choosing the spectrum of the input; in this paper, we address this problem. We present a condition for convergence to the global minimum of the cost function and propose its inclusion in the input design. We present the application of the proposed approach to case studies where the algorithms tend to get trapped in nonglobal minima.
Detection of radionuclides from weak and poorly resolved spectra using Lasso and subsampling techniques
2011, Radiation Measurements
We consider a problem of identification of nuclides from weak and poorly resolved spectra. A two stage algorithm is proposed and tested based on the principle of majority voting. The idea is to model gamma-ray counts as Poisson processes. Then, the average part is taken to be the model and the difference between the observed gamma-ray counts and the average is considered as random noise. In the linear part, the unknown coefficients correspond to if isotopes of interest are present or absent. Lasso types of algorithms are applied to find non-vanishing coefficients. Since Lasso or any prediction error based algorithm is inconsistent with variable selection for finite data length, an estimate of parameter distribution based on subsampling techniques is added in addition to Lasso. Simulation examples are provided in which the traditional peak detection algorithms fail to work and the proposed two stage algorithm performs well in terms of both the False Negative and False Positive errors.
Selecting Sensitive Parameter Subsets in Dynamical Models with Application to Biomechanical System Identification
2018, Journal of Biomechanical Engineering
SOC and SOH estimation for Li-ion battery based on an equivalent hydraulic model. Part II: SOH power fade estimation
2016, Proceedings of the American Control Conference
Principles of System Identification: Theory and Practice
2014, Principles of System Identification: Theory and Practice

View all citing articles on Scopus

Simone Garatti is an Assistant Professor at the Dipartimento di Elettronica ed Informazione of the Politecnico di Milano. He was born in Brescia, Italy, in 1976 and received the Laurea degree and the Ph.D. in Information Technology Engineering in 2000 and 2004, respectively, both from the Politecnico di Milano, Milano, Italy. Dr. Garatti has been a visiting scholar at the Lund University of Technology, Lund, Sweden, at the University of California San Diego (UCSD), San Diego, CA, USA, and at the Massachusetts Institute of Technology and the Northeastern University, Boston, MA, USA. His research interests include system identification and model quality assessment, identification of interval predictor models, and randomized optimization for problems in systems and control.

Robert R. Bitmead was born in Sydney, Australia, in 1954. He received the B.Sc. degree in applied mathematics from the University of Sydney, Sydney, in 1976 and the M.E. and Ph.D. degrees in electrical engineering from the University of Newcastle, Australia, in 1977 and 1979, respectively. He currently holds the Cymer Endowed Chair in the Department of Mechanical and Aerospace Engineering, University of California, San Diego (UCSD). He has been on the Faculty at UCSD since 1999 and has held faculty positions at the Australian National University (1982–1999) and James Cook University of North Queensland (1980–1982). He has held visiting faculty positions at Cornell University; the University of Louvain, Belgium; INRIA France; and Kyoto University, Japan. His research is in the areas of adaptive systems, estimation, control design, modeling, and telecommunications. Dr. Bitmead is a Fellow of the Australian Academy of Technological Sciences and Engineering, the International Federation of Automatic Control and the IEEE.

^☆: This work was supported by the National Research Council of Italy (CNR), the MIUR national project “Identification and adaptive control of industrial systems”, and by the US Air Force Office of Scientific Research under Award No. FA9550-05-1-0401. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of AFOSR. Preliminary version submitted for presentation to the 15th IFAC Symposium on System Identification (SYSID 2009) — July 6–8, 2009, Saint-Malo France. This paper was recommended for publication in revised form by Associate Editor Wolfgang Scherrer under the direction of Editor Torsten Söderström.

View full text

On resampling and uncertainty estimation in Linear System Identification☆

Abstract

Introduction

Section snippets

Asymptotic theory and its limitations — the SMS example

Resampling strategies

Analysis of the Subsampling method

Analysis of Model-based Jackknife & Bootstrap

Bose (1988), Theorem 3.9

SMS example redux

Conclusions

Automatica

Automatica

Automatica

Systems & Control Letters

Stochastic Processes and their Applications

Edgeworth correction by Bootstrap in autoregressions

The Annals of Statistics

Non parametric statistics for stochastic processes

Convergence of distributions generated by stationary stochastic processes

Theory of Probability and its Applications

The jackknife, the bootstrap, and other resampling plans

Computer-intensive methods in statistical regression

SIAM Review

An introduction to the bootstrap