Fast robust regression algorithms for problems with Toeplitz structure

doi:10.1016/j.csda.2007.05.008

Computational Statistics & Data Analysis

Volume 52, Issue 2, 15 October 2007, Pages 1119-1131

https://doi.org/10.1016/j.csda.2007.05.008 Get rights and content

Abstract

The problem of computing an approximate solution of an overdetermined system of linear equations is considered. The usual approach to the problem is least squares, in which the 2-norm of the residual is minimized. This produces the minimum variance unbiased estimator of the solution when the errors in the observations are independent and normally distributed with mean 0 and constant variance. It is well known, however, that the least squares solution is not robust if outliers occur, i.e., if some of the observations are contaminated by large error. In this case, alternate approaches have been proposed which judge the size of the residual in a way that is less sensitive to these components. These include the Huber M-function, the Talwar function, the logistic function, the Fair function, and the $ℓ_{1}$ norm. New algorithms are proposed to compute the solution to these problems efficiently, in particular, when the matrix A has small displacement rank. Matrices with small displacement rank include matrices that are Toeplitz, block-Toeplitz, block-Toeplitz with Toeplitz blocks, Toeplitz plus Hankel, and a variety of other forms. For exposition, only Toeplitz matrices are considered, but the ideas apply to all matrices with small displacement rank. Algorithms are also presented to compute the solution efficiently when a regularization term is included to handle the case when the matrix of the coefficients is ill-conditioned or rank-deficient. The techniques are illustrated on a problem of FIR system identification.

Introduction

Consider the approximation problem $Ax \approx b,$ where $A \in R^{m \times n}$ and $b \in R^{m}$ ( $m ⩾ n$ ) are given and $x \in R^{n}$ is to be determined. We define the residual $r = b - Ax .$ The usual approach to the problem is least squares, in which we minimize the 2-norm of the residual over all choices of x. This produces the minimum variance unbiased estimator of the solution when the errors in the observation b are independent and normally distributed with mean 0 and constant variance.

It is well known, however, that the least squares solution is not robust if outliers occur, i.e., if some of the components of b are contaminated by large error. In this case, alternate approaches have been proposed which judge the size of the residual in a way that is less sensitive to these components. For example, in order to reduce the influence of the outliers, we might replace the least squares problem $\min_{x} ∥ r ∥_{2},$ by $\min_{x} \sum_{i = 1}^{m} ρ (r_{i} (x)),$ subject to $r = b - Ax$ , where $ρ$ is a given function. This robust regression problem has been extensively studied. Taking $ρ (z) = z^{2} / 2$ gives the ordinary linear least squares problem. Other possible functions include: In Section 2, we consider how the solution to weighted problems can be computed efficiently, in particular, when the matrix A has small displacement rank (Kailath and Sayed, 1999, Kailath and Sayed, 1995). Matrices with small displacement rank include matrices that are Toeplitz, block-Toeplitz, block-Toeplitz with Toeplitz blocks (BTTB), Toeplitz plus Hankel, and a variety of other forms. This structure has been effectively exploited in solving least squares problems (Kailath and Sayed, 1999, Ng, 1996), weighted least squares problems (Benzi and Ng, 2006), total least squares problems (Kalsi and O’Leary, 2006; Mastronardi et al., 2001a, Mastronardi et al., 2004), and regression using other norms (Pruessner and O’Leary, 2003), so we now extend these ideas to robust regression. For exposition, in Section 2 we focus on Toeplitz matrices, but the ideas apply to all matrices with small displacement rank. In Section 3, we also show how to compute the solution efficiently when we include a regularization term in case the matrix A is ill-conditioned or rank-deficient. Section 4 extends these results to robust regression problems. Experiments illustrating these ideas are presented in Section 5, and Section 6 provides our conclusions.

Section snippets

Weighting to solve well-conditioned problems with outliers

In this section we assume that we have a well-conditioned model in which we know which components of the vector b have large errors. This situation arises, for example, if there are known defects in a set of detectors which reduce the effectiveness of some of them without making their data worthless. In Section 3, we consider ill-conditioned problems, and in Section 4, we extend these ideas to situations in which the number and indices of the outliers are unknown.

Suppose that p of our

Weighting for the ill-conditioned or rank-deficient case

If the matrix A is ill-conditioned or rank-deficient, then a regularization term is often added, transforming our problem to $\min_{x} ∥ D^{1 / 2} (b - Ax) ∥_{2}^{2} + λ ∥ Fx ∥_{2}^{2},$ where $F$ is a matrix designed to control the size of x and $λ > 0$ defines the relative importance of the two terms. Typical choices of the matrix $F$ include the identity matrix and the matrices $[\begin{matrix} 1 & - 1 \\ ⋱ & ⋱ \\ 1 & - 1 \end{matrix}] \in R^{(n - 1) \times n} and [\begin{matrix} 1 & - 2 & 1 \\ ⋱ & ⋱ & ⋱ \\ 1 & - 2 & 1 \end{matrix}] \in R^{(n - 2) \times n},$ that are scaled approximations to the first and the second derivative operators, respectively (Hansen, 1998). By

Solving robust regression problems

In this section we apply the techniques we have developed to the robust regression problem (1). It is possible to add a regularization term, but for ease of exposition, we omit it.

We consider solving our minimization problem (1) by Newton's method as, for example, in O’Leary (1990). We compute the gradient vector z defined by $z_{j} = ρ^{'} (r_{j})$ and define $D (r)$ to be a diagonal matrix with entries $d_{jj} = ρ^{″} (r_{j})$ . (At points where the function fails to be differentiable, standard techniques, such as use of

Numerical examples

We performed some numerical experiments with the algorithms designed in this paper, considering an example taken from a paper on Toeplitz least squares problems with no outliers (Ng, 1996).

Example 1

This example concerns finite impulse response (FIR) system identification, which has wide applications in engineering. The input signal of an FIR system identification model drives the unknown system to produce the output sequence. We model the unknown system as an FIR filter. If the unknown system is

Conclusions

We have shown how to efficiently compute the solution to robust regression problems when the data matrix has low displacement rank, and have also explained how regularization can be included in the problem formulation. Thus, the possibility of having outliers in the data is no longer an impediment to using fast methods for structured problems.

Acknowledgments

The work of the first author was partially supported by MIUR, Grant no. 2004015437, and by the National Research Council of Italy under the Short Term Mobility Program. This author would like to acknowledge the hospitality of the Department of Computer Science and Institute for Advanced Computer Studies of University of Maryland, where part of this work took place.

The work of second author was supported by the National Science Foundation under Grant CCF 05-14213 and the Department of Energy

References (30)

N. Mastronardi et al.
Implementation of the regularized structured total least squares algorithms for blind image deblurring
Linear Algebra Appl.
(2004)
M. Benzi et al.
Preconditioned iterative methods for weighted Toeplitz least squares problems
SIAM J. Matrix Anal. Appl.
(2006)
Å. Björck
Numerical Methods for Least Squares Problems
(1996)
Å. Björck et al.
Solution of augmented linear systems using orthogonal factorizations
BIT
(1994)
A.W. Bojanczyk et al.
Stabilized hyperbolic Householder transformations
IEEE Trans. Acoust. Speech Signal Process.
(1989)
S. Chandrasekaran et al.
Stabilizing the generalized Schur algorithm
SIAM J. Matrix Anal. Appl.
(1996)
D. Coleman et al.
A system of subroutines for iteratively reweighted least squares computations
ACM Trans. Math. Software
(1980)
N. Dyn et al.
The numerical solution of equality-constrained quadratic programming problems
Math. Comp.
(1983)
R.C. Fair
On the robust estimation of econometric models
Ann. Econom. Social Measurement
(1974)
Fang, H.R., O’Leary, D.P., 2006. Modified Cholesky algorithms: a catalog with new approaches, Submitted for...

G.H. Golub et al.

Matrix Computations

(1996)

P.C. Hansen

Rank-Deficient and Discrete Ill-Posed Problems. Numerical Aspects of Linear Inversion

(1998)

S. Haykin

Adaptive Filter Theory

(2001)

M.J. Hinich et al.

A simple method for robust regression

J. Amer. Statist. Assoc.

(1975)

P.W. Holland et al.

Robust regression using iteratively reweighted least-squares

Comm. Statist. Theory Methods A

(1977)

Cited by (7)

Bias and covariance of the least squares estimate in a structured errors-in-variables problem
2020, Computational Statistics and Data Analysis
Citation Excerpt :
The least-squares (LS) estimator is a suboptimal but simple solution to structured EIV problems that admits a recursive form for easy real-time implementations. Some of the reported works that propose LS estimators for structured EIV problems include the design of a fast algorithm for matrices with small displacement rank (Mastronardi and O’Leary, 2007), the study of the estimator consistency (Palanthandalam-Madapusi et al., 2010), the determination of the bias, and the mean squared error of the parameter estimates in the identification of AR models (Kiviet and Phillips, 2012, 2014), and a discussion of the causes of bias and inconsistency in homogeneous estimators (Yeredor and De Moor, 2004). In measurement applications, it is highly relevant to assess the uncertainty of the input estimate.
A structured errors-in-variables (EIV) problem arising in metrology is studied. The observations of a sensor response are subject to perturbation. The input estimation from the transient response leads to a structured EIV problem. Total least squares (TLS) is a typical estimation method to solve EIV problems. The TLS estimator of an EIV problem is consistent, and can be computed efficiently when the perturbations have zero mean, and are independently and identically distributed (i.i.d). If the perturbation is additionally Gaussian, the TLS solution coincides with maximum-likelihood (ML). However, the computational complexity of structured TLS and total ML prevents their real-time implementation. The least-squares (LS) estimator offers a suboptimal but simple recursive solution to structured EIV problems with correlation, but the statistical properties of the LS estimator are unknown. To know the LS estimate uncertainty in EIV problems, either structured or not, to provide confidence bounds for the estimation uncertainty, and to find the difference from the optimal solutions, the bias and variance of the LS estimates should be quantified. Expressions to predict the bias and variance of LS estimators applied to unstructured and structured EIV problems are derived. The predicted bias and variance quantify the statistical properties of the LS estimate and give an approximation of the uncertainty and the mean squared error for comparison to the Cramér–Rao lower bound of the structured EIV problem.
Approaches to robust process identification: A review and tutorial of probabilistic methods
2018, Journal of Process Control
Citation Excerpt :
Robust system identification techniques are the host of methods that address the above mentioned problems, resulting from the unforeseen effects of noise in the data set, systematically and holistically. There have been earlier attempts to deal with the problem of robust identification without hinging on the framework of probabilistic inference, for example, employing, Huber's regression [8,9], wavelet based M-step estimators [10], simultaneous identification of parameters together with uncertainty bounds [11], worst case uncertainties [12–15], instrumental variables [16,17], subspace methods [18], co-prime factorizations [19], radial basis networks [20], invalidation methods [11], quantified estimations [21], relay and sinusoidal tests [22,23], optimization [24], pseudo singular values [25], fast-Fourier transforms [26], kernel based correntropy [27], and set membership identification methods [28–30]. Robust algorithms for the identification of output error model are also proposed in literature considering Huber's statistics [31], Hermite polynomials [32], projection operators [33], support vector machines [34], and Bayesian Kernals [35,36].
Industrial data sets are often contaminated with outliers due to sensor malfunctions, signal interference, and other disturbances as well as interplay of various other factors. The effect of data abnormalities due to the outliers has to be systematically accounted while developing models that are resistant towards unforeseen effects of the outliers. The spectrum of methods that account for irregularities in process data while modeling are collectively known as robust identification methods. Even though, there are various non-probabilistic methods to tackle robust identification, few of them have considered the effect of outliers explicitly. In contrast to that, probabilistic identification methods ensure that these effects are given due attention. Despite these advantages, the probabilistic robust identification strategies have hardly been explored by practitioners. This review paper provides a general introduction to the probabilistic methods for robust identification, illustrates the main steps involved in the development of models, and reviews the related literature. Further, the paper contains two tutorial examples, including an industrial case study, to highlight various steps involved in the robust identification process.
Benchmark testing of algorithms for very robust regression: FS, LMS and LTS
2012, Computational Statistics and Data Analysis
Citation Excerpt :
The algorithms that we use are the Forward Search Regression routine (FSR), the LMS and LTS routines and their reweighted versions contained in the MATLAB toolbox called FSDA (Forward Search Data Analysis) at http://www.riani.it/MATLAB, two versions of Fast LTS, the first based on the implementation contained in the LIBRA toolbox at http://wis.kuleuven.be/stat/robust (Verboven and Hubert, 2005, 2010) as originally proposed by Rousseeuw and Van Driessen (2006) and the second based on our implementation in FSDA. Although all our algorithms depend on searching over subsets of observations, other algorithmic approaches to robust regression continue to be explored, including those described by Li (2004), Mastronardia and O’Leary (2007), Flores (2010) and Nunkesser and Morell (2010). García-Escudero et al. (2010) describe methods for the clustering of robustly estimated regression models.
The methods of very robust regression resist up to 50% of outliers. The algorithms for very robust regression rely on selecting numerous subsamples of the data. New algorithms for LMS and LTS estimators that have increased computational efficiency due to improved combinatorial sampling are proposed. These and other publicly available algorithms are compared for outlier detection. Timings and estimator quality are also considered. An algorithm using the forward search (FS) has the best properties for both size and power of the outlier tests.
Total Least Squares and Errors-in-variables Modeling
2007, Computational Statistics and Data Analysis
Urban growth derived from landsat time series using harmonic analysis: A case study in south england with high levels of cloud cover
2021, Remote Sensing
Robust regression for mixed Poisson–Gaussian model
2018, Numerical Algorithms

View all citing articles on Scopus

View full text

Fast robust regression algorithms for problems with Toeplitz structure

Abstract

Introduction

Section snippets

Weighting to solve well-conditioned problems with outliers

Weighting for the ill-conditioned or rank-deficient case

Solving robust regression problems

Numerical examples

Conclusions

Acknowledgments

Linear Algebra Appl.

Preconditioned iterative methods for weighted Toeplitz least squares problems

SIAM J. Matrix Anal. Appl.

Numerical Methods for Least Squares Problems

Solution of augmented linear systems using orthogonal factorizations

BIT

Stabilized hyperbolic Householder transformations

IEEE Trans. Acoust. Speech Signal Process.

Stabilizing the generalized Schur algorithm

SIAM J. Matrix Anal. Appl.

A system of subroutines for iteratively reweighted least squares computations

ACM Trans. Math. Software

The numerical solution of equality-constrained quadratic programming problems

Math. Comp.

On the robust estimation of econometric models

Ann. Econom. Social Measurement