Elsevier

Signal Processing

Volume 174, September 2020, 107608
Signal Processing

A robust adaptive Lasso estimator for the independent contamination model

https://doi.org/10.1016/j.sigpro.2020.107608Get rights and content

Highlights

  • Robust variable selection and parameter estimation in the presence of outliers.

  • A few highly contaminated predictors can cause existing robust estimators to break down.

  • The Independent Contamination Model is a relatively new and realistic model for outliers.

  • Combining a robust loss function with a sparsity and robustness inducing penalty term.

  • Determining the temporal releases of the European Tracer Experiment (ETEX).

Abstract

The Lasso has become a benchmark method for simultaneous parameter estimation and variable selection in regression analysis. It is based on the least-squares estimator and, therefore, suffers from the presence of outliers. Robust Lasso methods combine the objective function of a robust estimator with ℓ1-penalization. We address robustness for cases in which the number of observations is smaller (or not much larger) than the number of predictors. Further, we assume that the regression matrix may contain cellwise outliers. In such settings, even a few highly contaminated predictors can cause existing robust methods that are based on the commonly used rowwise contamination model to break down. Therefore, we propose a new adaptive Lasso type regularization. It takes into account cellwise outlyingness in the regression matrix and uses this information for robust variable selection. The proposed regularization term is integrated into the objective function of the MM-estimator, which yields the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). A performance comparison to existing robust Lasso estimators is provided using Monte Carlo experiments. Further, the MM-RWAL is applied to determine the temporal releases of the European Tracer Experiment (ETEX) at the source location. This sparse and ill-conditioned linear inverse problem contains cellwise and rowwise outliers.

Introduction

Many of today’s signal processing problems can be formulated as a linear regression modely=Xβ+u,where XRn×p is the predictor matrix, uRn×1 is the error vector yRn×1 is the response vector, and βRp×1 is the unknown parameter vector.

The presence of outliers and impulsive noise in linear regression problems has been reported in applications as diverse as wireless communication, ultrasonic systems, computer vision, electric power systems, automated detections of defects, biomedical signal analysis, genomics and the estimation of the temporal releases of a pollutant to the atmosphere. See [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] and references therein. It is well-known that violations of the Gaussian noise assumption cause a drastic performance drop for the commonly used least-squares estimator (LSE) [11], [12], [13]β^LSE=argminβyXβ22.

This has led to the development of robust estimators. For decades, the vast majority of robust linear regression estimators has focused on robustness against rowwise contamination. Under the so-called Tukey-Huber contamination model (THCM) [12], even for high-breakdown regression estimators, such as the LTS-, S-, MM-, and τ-estimators [1], [12], only a minority of the rows of X may be contaminated. In [14], Rousseuw and Van den Bossche state that the outlying rows paradigm is no longer sufficient for modern high-dimensional data sets. It often happens that most data cells (entries) in a row are regular and just a few of them are anomalous.

The case that independent cells of X are outliers is referred to as the independent contamination model (ICM) [15], [16], [17]. Only recently, the first cellwise robust regression estimation methods have been developed [16], [17]. Developing new cellwise robust estimators is essential for solving many real-world problems. For example, the estimation of the spatio-temporal emissions of a pollutant, given noisy observations, can be formulated as a linear inverse problem with the help of an atmospheric dispersion model [7]. The data of the European Tracer Experiment (ETEX) which was conducted in Monterfil, Brittany in 1994, where Perfluorocarbon (PFC) tracers were released into the atmosphere, for instance, contains both cellwise and rowwise outliers.

Additionally to the robustness considerations, atmospheric inverse problems, like many other problems in signal processing, require finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. For example, handling large data sets in terms of model interpretation, including the case where the number of explanatory variables p is larger than the sample size n, requires penalized estimators, such as the classical least absolute shrinkage and selection operator (Lasso) [18]β^Lasso=argminβyXβ22+λβ1.with λR+.

Many other regularizations have been proposed [19], [20], [21], [22], [23]. In this paper the focus lies on Lasso estimation, to select a robust and interpretable model in high-dimensional settings. Zou [24] showed that the Lasso variable selection can be inconsistent, so that the oracle properties do not hold and proposed the adaptive Lassoβ^Lassoad=argminβyXβ22+λj=1pw^j|βj|,where w^j=1/|β^j|γ (γ > 0) are non-negative weights depending on β^, which is a n-consistent estimator of β. For a thorough discussion on consistency of estimators, see [25].

Just like the LSE, the Lasso and the adaptive Lasso rely on the Gaussian noise assumption and are sensitive to outliers. In recent years, some robust and regularized approaches have been proposed that replace the penalized square objective function by a penalized bounded objective function [8], [10], [26], [27], [28]. These methods, however, again, rely on the THCM.

Especially, in cases where the number of predictors p exceeds the number of rows n, it becomes more and more likely that a few highly contaminated predictors force the THCM-based estimators to flag all data points as outliers, which makes it impossible to draw any inferences from the data. This scenario is illustrated in Fig. 1, where a predictor matrix with n data points and p variables is depicted. In this example, the first predictor contains two outliers and the second predictor contains three outliers. For classical robust estimators such as the least trimmed squares estimator (LTS-estimator) [29] or the MM-estimator [12] and their sparsity inducing counterparts namely the Sparse LTS-estimator [27] and the Sparse MM-estimator [28], the depicted predictor matrix appears to be fully contaminated. The reason for this is that for these estimators a single contaminated cell in the predictor matrix leads to flagging the whole corresponding data point as an outlier which is then removed (LTS-estimator and Sparse LTS-estimator) or downweighted (MM-estimator and Sparse MM-estimator). This is depicted in the lower part of Fig. 1.

Such scenarios require methods that flag highly contaminated predictors and decrease their likelihood of being selected as active variables by Lasso type estimators.

Original Contributions: We propose and analyze a robustly weighted and adaptive Lasso type regularization term, which takes into account cellwise outliers for model selection. The proposed regularization term is integrated into the objective function of the MM-estimator, which yields the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). The RWAL penalty can easily be integrated into the objective function of other robust estimators. A performance comparison to existing robust Lasso estimators is provided using Monte Carlo experiments. Further, a real-data application of estimating the sparse non-negative spatio-temporal emissions of a pollutant, given noisy observations y and an imprecisely estimated ill-conditioned and sparse dispersion model X, is considered. This example contains both cellwise and rowwise outliers.

Notation: Scalars are denoted by lowercase letters, e.g., x, column vectors by bold-faced lowercase letters, e.g. x, matrices by bold-faced uppercase letters, e.g. X, sets are denoted by calligraphic letters, e.g. X with associated cardinality |X|. The jth column of a matrix X is denoted by xj while (x)i: j denotes the vector that contains the entries i to j of vector x. The ith element of vector x is denoted by xi, Ip is the p-dimensional identity-matrix, 0p is the p-dimensional all-zeros vector and diag(x) forms a matrix that contains the entries of x as its diagonal. β^ refers to the estimator (or estimate) of the parameter vector β, ( · ) is the transpose operator. The derivative of a function f with respect to its argument is abbreviated by f′. P(X) is the probability of event X. Bin(1, ϵ) denotes the binomial distribution with one trial and a success probability of ϵ. Convergence to the normal distribution with mean vector μ and covariance matrix Σ is denoted by dN(0,Σ).

Organization: Section 2 discusses the Tukey-Huber and Independent Contamination models and motivates the use of cellwise robust methods. Section 3 introduces the proposed estimator and provides algorithms to compute the estimates. Section 4 provides numerical experiments, while Section 5 contains a real-data application of source estimation for an atmospheric inverse problem. Finally, Section 6 concludes the paper with a brief outlook on future work.

Section snippets

Tukey-Huber and independent contamination models

In this Section, we will review the Tukey-Huber and Independent Contamination models and discuss their advantages and drawbacks in modeling outliers in the data.

Proposed method

In this section, we introduce a new method called the MM-Robust Weighted Adaptive Lasso (MM-RWAL).

Simulation setup

Two different Monte Carlo studies are conducted, to assess the performance of the proposed MM-RWAL.

Scenario 1: p > n, correlated predictors, cellwise outliers

A setup with p=50 predictors and n=30 observations is considered. The regression parameters are defined by βj=j/5j{1,,5}, while βj=0j{6,,50}. Correlated predictors xj, j{1,,p} are generated by sampling from a multivariate zero mean Gaussian distribution with covariance matrix Σij=0.5|ij|, i,j{1,,p}. The errors ui are zero mean

A Real Data example of source estimation for an atmospheric inverse problem

Quantifying the emissions of a pollutant into the atmosphere is essential, for example, in the case of nuclear power plant accidents, volcano eruptions, or to track the releases of greenhouse gases. In this paper, we apply penalized robust estimation to determine the temporal releases of the particles of the European Tracer Experiment (ETEX) at the source location. During the ETEX experiment tracers (perfluorocarbons) were released into the atmosphere in Monterfil, Brittany in 1994. Hourly

Conclusion

The problem of finding sparse solutions to under-determined, or ill-conditioned, linear regression problems that are contaminated by cellwise and rowwise outliers was investigated. We introduced and analyzed a robustly weighted and adaptive Lasso type regularization term and integrated it into the objective function of the MM-estimator, resulting in the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). The regularization term takes into account cellwise outlyingness in the regression matrix

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors would like to thank Marta Martinez-Camara and Martin Vetterli for making us aware of the ETEX experiment and for many interesting discussions on this application.

The work of Jasin Machkour has been funded by the German Research Foundation (DFG) under grant number 425884435.

The work of Michael Muma has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY centre and is supported by the ‘Athene Young Investigator Programme’ of Technische Universität Darmstadt,

References (37)

  • M. Martinez-Camara et al.

    A robust method for inverse transport modeling of atmospheric emissions using blind outlier detection

    Geosci. Model Dev.

    (2014)
  • M. Martinez-Camara et al.

    A new robust and efficient estimator for ill-conditioned linear inverse problems with outliers

    Proc. IEEE Int. Conf. Acoust. Speech Signal Process

    (2015)
  • W.-J. Zeng et al.

    Outlier-robust greedy pursuit algorithms in ℓp-space for sparse approximation

    IEEE Trans. Signal Process.

    (2016)
  • E. Ollila

    Adaptive lasso based on joint M-estimation of regression and scale

    Proc. Eur. Signal Process. Conf.

    (2016)
  • F.R. Hampel et al.

    Robust Statistics: The Approach Based on the Influence Function

    (2005)
  • R.A. Maronna et al.

    Robust Statistics

    (2006)
  • P.J. Huber et al.

    Robust Statistics

    (2009)
  • P.J. Rousseeuw, W. Van den Bossche, Detecting deviating data cells,...
  • Cited by (6)

    • Sparse regression for large data sets with outliers

      2022, European Journal of Operational Research
      Citation Excerpt :

      However, the former cannot be computed in high-dimensional settings; the latter is numerically unstable and does not give sparse regression coefficients. Another recent proposal was made by Machkour, Alt, Muma, and Zoubir (2017), Machkour, Muma, Alt, and Zoubir (2020) who define a measure of outlyingness for each row and column of the data matrix and combine this into an outlyingness score for each cell. Their method is computationally demanding, the proposed sparse shooting S, in contrast, does not suffer from this drawback.

    View full text