A robust adaptive Lasso estimator for the independent contamination model

doi:10.1016/j.sigpro.2020.107608

Signal Processing

Volume 174, September 2020, 107608

https://doi.org/10.1016/j.sigpro.2020.107608 Get rights and content

Highlights

•
Robust variable selection and parameter estimation in the presence of outliers.
•
A few highly contaminated predictors can cause existing robust estimators to break down.
•
The Independent Contamination Model is a relatively new and realistic model for outliers.
•
Combining a robust loss function with a sparsity and robustness inducing penalty term.
•
Determining the temporal releases of the European Tracer Experiment (ETEX).

Abstract

The Lasso has become a benchmark method for simultaneous parameter estimation and variable selection in regression analysis. It is based on the least-squares estimator and, therefore, suffers from the presence of outliers. Robust Lasso methods combine the objective function of a robust estimator with ℓ₁-penalization. We address robustness for cases in which the number of observations is smaller (or not much larger) than the number of predictors. Further, we assume that the regression matrix may contain cellwise outliers. In such settings, even a few highly contaminated predictors can cause existing robust methods that are based on the commonly used rowwise contamination model to break down. Therefore, we propose a new adaptive Lasso type regularization. It takes into account cellwise outlyingness in the regression matrix and uses this information for robust variable selection. The proposed regularization term is integrated into the objective function of the MM-estimator, which yields the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). A performance comparison to existing robust Lasso estimators is provided using Monte Carlo experiments. Further, the MM-RWAL is applied to determine the temporal releases of the European Tracer Experiment (ETEX) at the source location. This sparse and ill-conditioned linear inverse problem contains cellwise and rowwise outliers.

Introduction

Many of today’s signal processing problems can be formulated as a linear regression model $y = X β + u,$ where $X \in R^{n \times p}$ is the predictor matrix, $u \in R^{n \times 1}$ is the error vector $y \in R^{n \times 1}$ is the response vector, and $β \in R^{p \times 1}$ is the unknown parameter vector.

The presence of outliers and impulsive noise in linear regression problems has been reported in applications as diverse as wireless communication, ultrasonic systems, computer vision, electric power systems, automated detections of defects, biomedical signal analysis, genomics and the estimation of the temporal releases of a pollutant to the atmosphere. See [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] and references therein. It is well-known that violations of the Gaussian noise assumption cause a drastic performance drop for the commonly used least-squares estimator (LSE) [11], [12], [13] ${\hat{β}}_{LSE} = \underset{β}{arg min} ∥ y - X β ∥_{2}^{2} .$

This has led to the development of robust estimators. For decades, the vast majority of robust linear regression estimators has focused on robustness against rowwise contamination. Under the so-called Tukey-Huber contamination model (THCM) [12], even for high-breakdown regression estimators, such as the LTS-, S-, MM-, and τ-estimators [1], [12], only a minority of the rows of X may be contaminated. In [14], Rousseuw and Van den Bossche state that the outlying rows paradigm is no longer sufficient for modern high-dimensional data sets. It often happens that most data cells (entries) in a row are regular and just a few of them are anomalous.

The case that independent cells of X are outliers is referred to as the independent contamination model (ICM) [15], [16], [17]. Only recently, the first cellwise robust regression estimation methods have been developed [16], [17]. Developing new cellwise robust estimators is essential for solving many real-world problems. For example, the estimation of the spatio-temporal emissions of a pollutant, given noisy observations, can be formulated as a linear inverse problem with the help of an atmospheric dispersion model [7]. The data of the European Tracer Experiment (ETEX) which was conducted in Monterfil, Brittany in 1994, where Perfluorocarbon (PFC) tracers were released into the atmosphere, for instance, contains both cellwise and rowwise outliers.

Additionally to the robustness considerations, atmospheric inverse problems, like many other problems in signal processing, require finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. For example, handling large data sets in terms of model interpretation, including the case where the number of explanatory variables p is larger than the sample size n, requires penalized estimators, such as the classical least absolute shrinkage and selection operator (Lasso) [18] ${\hat{β}}_{Lasso} = \underset{β}{arg min} {∥ y - X β ∥}_{2}^{2} + λ {∥ β ∥}_{1} .$ with $λ \in R^{+}$ .

Many other regularizations have been proposed [19], [20], [21], [22], [23]. In this paper the focus lies on Lasso estimation, to select a robust and interpretable model in high-dimensional settings. Zou [24] showed that the Lasso variable selection can be inconsistent, so that the oracle properties do not hold and proposed the adaptive Lasso ${\hat{β}}_{Lasso}^{ad} = \underset{β}{arg min} {∥ y - X β ∥}_{2}^{2} + λ \sum_{j = 1}^{p} {\hat{w}}_{j} | β_{j} |,$ where ${\hat{w}}_{j} = 1 / {| {\hat{β}}_{j} |}^{γ}$ (γ > 0) are non-negative weights depending on $\hat{β},$ which is a $\sqrt{n}$ -consistent estimator of β. For a thorough discussion on consistency of estimators, see [25].

Just like the LSE, the Lasso and the adaptive Lasso rely on the Gaussian noise assumption and are sensitive to outliers. In recent years, some robust and regularized approaches have been proposed that replace the penalized square objective function by a penalized bounded objective function [8], [10], [26], [27], [28]. These methods, however, again, rely on the THCM.

Especially, in cases where the number of predictors p exceeds the number of rows n, it becomes more and more likely that a few highly contaminated predictors force the THCM-based estimators to flag all data points as outliers, which makes it impossible to draw any inferences from the data. This scenario is illustrated in Fig. 1, where a predictor matrix with n data points and p variables is depicted. In this example, the first predictor contains two outliers and the second predictor contains three outliers. For classical robust estimators such as the least trimmed squares estimator (LTS-estimator) [29] or the MM-estimator [12] and their sparsity inducing counterparts namely the Sparse LTS-estimator [27] and the Sparse MM-estimator [28], the depicted predictor matrix appears to be fully contaminated. The reason for this is that for these estimators a single contaminated cell in the predictor matrix leads to flagging the whole corresponding data point as an outlier which is then removed (LTS-estimator and Sparse LTS-estimator) or downweighted (MM-estimator and Sparse MM-estimator). This is depicted in the lower part of Fig. 1.

Such scenarios require methods that flag highly contaminated predictors and decrease their likelihood of being selected as active variables by Lasso type estimators.

Original Contributions: We propose and analyze a robustly weighted and adaptive Lasso type regularization term, which takes into account cellwise outliers for model selection. The proposed regularization term is integrated into the objective function of the MM-estimator, which yields the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). The RWAL penalty can easily be integrated into the objective function of other robust estimators. A performance comparison to existing robust Lasso estimators is provided using Monte Carlo experiments. Further, a real-data application of estimating the sparse non-negative spatio-temporal emissions of a pollutant, given noisy observations $y$ and an imprecisely estimated ill-conditioned and sparse dispersion model X, is considered. This example contains both cellwise and rowwise outliers.

Notation: Scalars are denoted by lowercase letters, e.g., x, column vectors by bold-faced lowercase letters, e.g. x, matrices by bold-faced uppercase letters, e.g. X, sets are denoted by calligraphic letters, e.g. $X$ with associated cardinality $| X |$ . The jth column of a matrix X is denoted by x_j while (x)_{i: j} denotes the vector that contains the entries i to j of vector x. The ith element of vector x is denoted by x_i, $I_{p}$ is the p-dimensional identity-matrix, 0_p is the p-dimensional all-zeros vector and diag(x) forms a matrix that contains the entries of x as its diagonal. $\hat{β}$ refers to the estimator (or estimate) of the parameter vector β, ( · )^⊤ is the transpose operator. The derivative of a function f with respect to its argument is abbreviated by f′. P(X) is the probability of event X. Bin(1, ϵ) denotes the binomial distribution with one trial and a success probability of ϵ. Convergence to the normal distribution with mean vector μ and covariance matrix Σ is denoted by $\overset{d}{\to} N (0, Σ)$ .

Organization: Section 2 discusses the Tukey-Huber and Independent Contamination models and motivates the use of cellwise robust methods. Section 3 introduces the proposed estimator and provides algorithms to compute the estimates. Section 4 provides numerical experiments, while Section 5 contains a real-data application of source estimation for an atmospheric inverse problem. Finally, Section 6 concludes the paper with a brief outlook on future work.

Section snippets

Tukey-Huber and independent contamination models

In this Section, we will review the Tukey-Huber and Independent Contamination models and discuss their advantages and drawbacks in modeling outliers in the data.

Proposed method

In this section, we introduce a new method called the MM-Robust Weighted Adaptive Lasso (MM-RWAL).

Simulation setup

Two different Monte Carlo studies are conducted, to assess the performance of the proposed MM-RWAL.

Scenario 1: p > n, correlated predictors, cellwise outliers

A setup with $p = 50$ predictors and $n = 30$ observations is considered. The regression parameters are defined by $β_{j} = j / 5 j \in {1, \dots, 5},$ while $β_{j} = 0 j \in {6, \dots, 50}$ . Correlated predictors x_j, $j \in {1, \dots, p}$ are generated by sampling from a multivariate zero mean Gaussian distribution with covariance matrix $Σ_{i j} = 0 . 5^{| i - j |},$ $\forall i, j \in {1, \dots, p}$ . The errors u_i are zero mean

A Real Data example of source estimation for an atmospheric inverse problem

Quantifying the emissions of a pollutant into the atmosphere is essential, for example, in the case of nuclear power plant accidents, volcano eruptions, or to track the releases of greenhouse gases. In this paper, we apply penalized robust estimation to determine the temporal releases of the particles of the European Tracer Experiment (ETEX) at the source location. During the ETEX experiment tracers (perfluorocarbons) were released into the atmosphere in Monterfil, Brittany in 1994. Hourly

Conclusion

The problem of finding sparse solutions to under-determined, or ill-conditioned, linear regression problems that are contaminated by cellwise and rowwise outliers was investigated. We introduced and analyzed a robustly weighted and adaptive Lasso type regularization term and integrated it into the objective function of the MM-estimator, resulting in the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). The regularization term takes into account cellwise outlyingness in the regression matrix

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors would like to thank Marta Martinez-Camara and Martin Vetterli for making us aware of the ETEX experiment and for many interesting discussions on this application.

The work of Jasin Machkour has been funded by the German Research Foundation (DFG) under grant number 425884435.

The work of Michael Muma has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY centre and is supported by the ‘Athene Young Investigator Programme’ of Technische Universität Darmstadt,

References (37)

A. Leung et al.
Robust regression estimation and inference in the presence of cellwise and casewise contamination
Comput. Stat. Data Anal.
(2016)
H. Lian et al.
Nonconvex penalized reduced rank regression and its oracle properties in high dimensions
J. Multivar. Anal.
(2016)
E. Smucler et al.
Robust and sparse estimators for linear regression models
Comp. Statist. Data Anal.
(2017)
C. Croux et al.
Algorithms for projection-pursuit robust principal component analysis
Chemometr. Intell. Lab. Syst.
(2007)
A. Zoubir et al.
Robust estimation in signal processing: atutorial-style treatment of fundamental concepts
IEEE Signal Process. Mag
(2012)
A. Zoubir et al.
Robust Statistics for Signal Processing
(2018)
E. Elhamifar et al.
Sparse subspace clustering: algorithm, theory, and applications
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
G. Liu et al.
Robust recovery of subspace structures by low-rank representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
R. Couillet et al.
Robust estimates of covariance matrices in the large dimensional regime
IEEE Trans. Inf. Theory
(2014)
F. Pascal et al.
Generalized robust shrinkage estimator and its application to STAP detection problem
IEEE Trans. Signal Process.
(2014)

M. Martinez-Camara et al.

A robust method for inverse transport modeling of atmospheric emissions using blind outlier detection

Geosci. Model Dev.

(2014)

M. Martinez-Camara et al.

A new robust and efficient estimator for ill-conditioned linear inverse problems with outliers

Proc. IEEE Int. Conf. Acoust. Speech Signal Process

(2015)

W.-J. Zeng et al.

Outlier-robust greedy pursuit algorithms in ℓ_p-space for sparse approximation

IEEE Trans. Signal Process.

(2016)

E. Ollila

Adaptive lasso based on joint M-estimation of regression and scale

Proc. Eur. Signal Process. Conf.

(2016)

F.R. Hampel et al.

Robust Statistics: The Approach Based on the Influence Function

(2005)

R.A. Maronna et al.

Robust Statistics

(2006)

P.J. Huber et al.

Robust Statistics

(2009)

P.J. Rousseeuw, W. Van den Bossche, Detecting deviating data cells,...

Cited by (6)

Robust statistical methods for high-dimensional data, with applications in tribology
2023, Analytica Chimica Acta
Data sets derived from practical experiments often pose challenges for (robust) statistical methods. In high-dimensional data sets, more variables than observations are recorded and often, there are also data present that do not follow the structure of the data majority. In order to handle such data with outlying observations, a variety of robust regression and classification methods have been developed for low-dimensional data. The high-dimensional case, however, is more challenging, and the variety of robust methods is much more limited. The choice of the method depends on the specific data structure, and numerical problems are more likely to occur. We give an overview of selected robust methods as well as implementations and demonstrate the application with two high-dimensional data sets from tribology. We show that robust statistical methods combined with appropriate pre-processing and sampling strategies yield increased prediction performance and insight into data differing from the majority.
Sparse regression for large data sets with outliers
2022, European Journal of Operational Research
Citation Excerpt :
However, the former cannot be computed in high-dimensional settings; the latter is numerically unstable and does not give sparse regression coefficients. Another recent proposal was made by Machkour, Alt, Muma, and Zoubir (2017), Machkour, Muma, Alt, and Zoubir (2020) who define a measure of outlyingness for each row and column of the data matrix and combine this into an outlyingness score for each cell. Their method is computationally demanding, the proposed sparse shooting S, in contrast, does not suffer from this drawback.
The linear regression model remains an important workhorse for data scientists. However, many data sets contain many more predictors than observations. Besides, outliers, or anomalies, frequently occur. This paper proposes an algorithm for regression analysis that addresses these features typical for big data sets, which we call “sparse shooting S”. The resulting regression coefficients are sparse, meaning that many of them are set to zero, hereby selecting the most relevant predictors. A distinct feature of the method is its robustness with respect to outliers in the cells of the data matrix. The excellent performance of this robust variable selection and prediction method is shown in a simulation study. A real data application on car fuel consumption demonstrates its usefulness.
High-Dimensional False Discovery Rate Control for Dependent Variables
2024, arXiv
Implementation of adaptive lasso regression based on multiple Theil-Sen Estimators using differential evolution algorithm with heavy tailed errors†
2022, Journal of the National Science Foundation of Sri Lanka
The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control
2021, arXiv
Method of source identification following an accidental release at an unknown location using a lagrangian atmospheric dispersion model
2021, Atmosphere

View full text

A robust adaptive Lasso estimator for the independent contamination model

Highlights

Abstract

Introduction

Section snippets

Tukey-Huber and independent contamination models

Proposed method

Simulation setup

A Real Data example of source estimation for an atmospheric inverse problem

Conclusion

Declaration of Competing Interest

Acknowledgment

Comput. Stat. Data Anal.

J. Multivar. Anal.

Comp. Statist. Data Anal.

Chemometr. Intell. Lab. Syst.

Robust estimation in signal processing: atutorial-style treatment of fundamental concepts

IEEE Signal Process. Mag

Robust Statistics for Signal Processing

Sparse subspace clustering: algorithm, theory, and applications

IEEE Trans. Pattern Anal. Mach. Intell.

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.

Robust estimates of covariance matrices in the large dimensional regime

IEEE Trans. Inf. Theory

Generalized robust shrinkage estimator and its application to STAP detection problem

IEEE Trans. Signal Process.

A robust method for inverse transport modeling of atmospheric emissions using blind outlier detection

Geosci. Model Dev.

A new robust and efficient estimator for ill-conditioned linear inverse problems with outliers

Proc. IEEE Int. Conf. Acoust. Speech Signal Process

Outlier-robust greedy pursuit algorithms in ℓp-space for sparse approximation

IEEE Trans. Signal Process.

Adaptive lasso based on joint M-estimation of regression and scale

Proc. Eur. Signal Process. Conf.

Robust Statistics: The Approach Based on the Influence Function

Robust Statistics

Robust Statistics

Outlier-robust greedy pursuit algorithms in ℓ_p-space for sparse approximation