Fast robust regression algorithms for problems with Toeplitz structure

https://doi.org/10.1016/j.csda.2007.05.008Get rights and content

Abstract

The problem of computing an approximate solution of an overdetermined system of linear equations is considered. The usual approach to the problem is least squares, in which the 2-norm of the residual is minimized. This produces the minimum variance unbiased estimator of the solution when the errors in the observations are independent and normally distributed with mean 0 and constant variance. It is well known, however, that the least squares solution is not robust if outliers occur, i.e., if some of the observations are contaminated by large error. In this case, alternate approaches have been proposed which judge the size of the residual in a way that is less sensitive to these components. These include the Huber M-function, the Talwar function, the logistic function, the Fair function, and the 1 norm. New algorithms are proposed to compute the solution to these problems efficiently, in particular, when the matrix A has small displacement rank. Matrices with small displacement rank include matrices that are Toeplitz, block-Toeplitz, block-Toeplitz with Toeplitz blocks, Toeplitz plus Hankel, and a variety of other forms. For exposition, only Toeplitz matrices are considered, but the ideas apply to all matrices with small displacement rank. Algorithms are also presented to compute the solution efficiently when a regularization term is included to handle the case when the matrix of the coefficients is ill-conditioned or rank-deficient. The techniques are illustrated on a problem of FIR system identification.

Introduction

Consider the approximation problem Axb,where ARm×n and bRm (mn) are given and xRn is to be determined. We define the residual r=b-Ax.The usual approach to the problem is least squares, in which we minimize the 2-norm of the residual over all choices of x. This produces the minimum variance unbiased estimator of the solution when the errors in the observation b are independent and normally distributed with mean 0 and constant variance.

It is well known, however, that the least squares solution is not robust if outliers occur, i.e., if some of the components of b are contaminated by large error. In this case, alternate approaches have been proposed which judge the size of the residual in a way that is less sensitive to these components. For example, in order to reduce the influence of the outliers, we might replace the least squares problem minxr2,byminxi=1mρ(ri(x)),subject to r=b-Ax, where ρ is a given function. This robust regression problem has been extensively studied. Taking ρ(z)=z2/2 gives the ordinary linear least squares problem. Other possible functions include: In Section 2, we consider how the solution to weighted problems can be computed efficiently, in particular, when the matrix A has small displacement rank (Kailath and Sayed, 1999, Kailath and Sayed, 1995). Matrices with small displacement rank include matrices that are Toeplitz, block-Toeplitz, block-Toeplitz with Toeplitz blocks (BTTB), Toeplitz plus Hankel, and a variety of other forms. This structure has been effectively exploited in solving least squares problems (Kailath and Sayed, 1999, Ng, 1996), weighted least squares problems (Benzi and Ng, 2006), total least squares problems (Kalsi and O’Leary, 2006; Mastronardi et al., 2001a, Mastronardi et al., 2004), and regression using other norms (Pruessner and O’Leary, 2003), so we now extend these ideas to robust regression. For exposition, in Section 2 we focus on Toeplitz matrices, but the ideas apply to all matrices with small displacement rank. In Section 3, we also show how to compute the solution efficiently when we include a regularization term in case the matrix A is ill-conditioned or rank-deficient. Section 4 extends these results to robust regression problems. Experiments illustrating these ideas are presented in Section 5, and Section 6 provides our conclusions.

Section snippets

Weighting to solve well-conditioned problems with outliers

In this section we assume that we have a well-conditioned model in which we know which components of the vector b have large errors. This situation arises, for example, if there are known defects in a set of detectors which reduce the effectiveness of some of them without making their data worthless. In Section 3, we consider ill-conditioned problems, and in Section 4, we extend these ideas to situations in which the number and indices of the outliers are unknown.

Suppose that p of our

Weighting for the ill-conditioned or rank-deficient case

If the matrix A is ill-conditioned or rank-deficient, then a regularization term is often added, transforming our problem tominxD1/2(b-Ax)22+λFx22,where F is a matrix designed to control the size of x and λ>0 defines the relative importance of the two terms. Typical choices of the matrix F include the identity matrix and the matrices 1-11-1R(n-1)×nand1-211-21R(n-2)×n,that are scaled approximations to the first and the second derivative operators, respectively (Hansen, 1998). By

Solving robust regression problems

In this section we apply the techniques we have developed to the robust regression problem (1). It is possible to add a regularization term, but for ease of exposition, we omit it.

We consider solving our minimization problem (1) by Newton's method as, for example, in O’Leary (1990). We compute the gradient vector z defined by zj=ρ(rj)and define D(r) to be a diagonal matrix with entries djj=ρ(rj). (At points where the function fails to be differentiable, standard techniques, such as use of

Numerical examples

We performed some numerical experiments with the algorithms designed in this paper, considering an example taken from a paper on Toeplitz least squares problems with no outliers (Ng, 1996).

Example 1

This example concerns finite impulse response (FIR) system identification, which has wide applications in engineering. The input signal of an FIR system identification model drives the unknown system to produce the output sequence. We model the unknown system as an FIR filter. If the unknown system is

Conclusions

We have shown how to efficiently compute the solution to robust regression problems when the data matrix has low displacement rank, and have also explained how regularization can be included in the problem formulation. Thus, the possibility of having outliers in the data is no longer an impediment to using fast methods for structured problems.

Acknowledgments

The work of the first author was partially supported by MIUR, Grant no. 2004015437, and by the National Research Council of Italy under the Short Term Mobility Program. This author would like to acknowledge the hospitality of the Department of Computer Science and Institute for Advanced Computer Studies of University of Maryland, where part of this work took place.

The work of second author was supported by the National Science Foundation under Grant CCF 05-14213 and the Department of Energy

References (30)

  • N. Mastronardi et al.

    Implementation of the regularized structured total least squares algorithms for blind image deblurring

    Linear Algebra Appl.

    (2004)
  • M. Benzi et al.

    Preconditioned iterative methods for weighted Toeplitz least squares problems

    SIAM J. Matrix Anal. Appl.

    (2006)
  • Å. Björck

    Numerical Methods for Least Squares Problems

    (1996)
  • Å. Björck et al.

    Solution of augmented linear systems using orthogonal factorizations

    BIT

    (1994)
  • A.W. Bojanczyk et al.

    Stabilized hyperbolic Householder transformations

    IEEE Trans. Acoust. Speech Signal Process.

    (1989)
  • S. Chandrasekaran et al.

    Stabilizing the generalized Schur algorithm

    SIAM J. Matrix Anal. Appl.

    (1996)
  • D. Coleman et al.

    A system of subroutines for iteratively reweighted least squares computations

    ACM Trans. Math. Software

    (1980)
  • N. Dyn et al.

    The numerical solution of equality-constrained quadratic programming problems

    Math. Comp.

    (1983)
  • R.C. Fair

    On the robust estimation of econometric models

    Ann. Econom. Social Measurement

    (1974)
  • Fang, H.R., O’Leary, D.P., 2006. Modified Cholesky algorithms: a catalog with new approaches, Submitted for...
  • G.H. Golub et al.

    Matrix Computations

    (1996)
  • P.C. Hansen

    Rank-Deficient and Discrete Ill-Posed Problems. Numerical Aspects of Linear Inversion

    (1998)
  • S. Haykin

    Adaptive Filter Theory

    (2001)
  • M.J. Hinich et al.

    A simple method for robust regression

    J. Amer. Statist. Assoc.

    (1975)
  • P.W. Holland et al.

    Robust regression using iteratively reweighted least-squares

    Comm. Statist. Theory Methods A

    (1977)
  • Cited by (7)

    • Bias and covariance of the least squares estimate in a structured errors-in-variables problem

      2020, Computational Statistics and Data Analysis
      Citation Excerpt :

      The least-squares (LS) estimator is a suboptimal but simple solution to structured EIV problems that admits a recursive form for easy real-time implementations. Some of the reported works that propose LS estimators for structured EIV problems include the design of a fast algorithm for matrices with small displacement rank (Mastronardi and O’Leary, 2007), the study of the estimator consistency (Palanthandalam-Madapusi et al., 2010), the determination of the bias, and the mean squared error of the parameter estimates in the identification of AR models (Kiviet and Phillips, 2012, 2014), and a discussion of the causes of bias and inconsistency in homogeneous estimators (Yeredor and De Moor, 2004). In measurement applications, it is highly relevant to assess the uncertainty of the input estimate.

    • Approaches to robust process identification: A review and tutorial of probabilistic methods

      2018, Journal of Process Control
      Citation Excerpt :

      Robust system identification techniques are the host of methods that address the above mentioned problems, resulting from the unforeseen effects of noise in the data set, systematically and holistically. There have been earlier attempts to deal with the problem of robust identification without hinging on the framework of probabilistic inference, for example, employing, Huber's regression [8,9], wavelet based M-step estimators [10], simultaneous identification of parameters together with uncertainty bounds [11], worst case uncertainties [12–15], instrumental variables [16,17], subspace methods [18], co-prime factorizations [19], radial basis networks [20], invalidation methods [11], quantified estimations [21], relay and sinusoidal tests [22,23], optimization [24], pseudo singular values [25], fast-Fourier transforms [26], kernel based correntropy [27], and set membership identification methods [28–30]. Robust algorithms for the identification of output error model are also proposed in literature considering Huber's statistics [31], Hermite polynomials [32], projection operators [33], support vector machines [34], and Bayesian Kernals [35,36].

    • Benchmark testing of algorithms for very robust regression: FS, LMS and LTS

      2012, Computational Statistics and Data Analysis
      Citation Excerpt :

      The algorithms that we use are the Forward Search Regression routine (FSR), the LMS and LTS routines and their reweighted versions contained in the MATLAB toolbox called FSDA (Forward Search Data Analysis) at http://www.riani.it/MATLAB, two versions of Fast LTS, the first based on the implementation contained in the LIBRA toolbox at http://wis.kuleuven.be/stat/robust (Verboven and Hubert, 2005, 2010) as originally proposed by Rousseeuw and Van Driessen (2006) and the second based on our implementation in FSDA. Although all our algorithms depend on searching over subsets of observations, other algorithmic approaches to robust regression continue to be explored, including those described by Li (2004), Mastronardia and O’Leary (2007), Flores (2010) and Nunkesser and Morell (2010). García-Escudero et al. (2010) describe methods for the clustering of robustly estimated regression models.

    • Total Least Squares and Errors-in-variables Modeling

      2007, Computational Statistics and Data Analysis
    View all citing articles on Scopus
    View full text