A robust adaptive Lasso estimator for the independent contamination model
Introduction
Many of today’s signal processing problems can be formulated as a linear regression modelwhere is the predictor matrix, is the error vector is the response vector, and is the unknown parameter vector.
The presence of outliers and impulsive noise in linear regression problems has been reported in applications as diverse as wireless communication, ultrasonic systems, computer vision, electric power systems, automated detections of defects, biomedical signal analysis, genomics and the estimation of the temporal releases of a pollutant to the atmosphere. See [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] and references therein. It is well-known that violations of the Gaussian noise assumption cause a drastic performance drop for the commonly used least-squares estimator (LSE) [11], [12], [13]
This has led to the development of robust estimators. For decades, the vast majority of robust linear regression estimators has focused on robustness against rowwise contamination. Under the so-called Tukey-Huber contamination model (THCM) [12], even for high-breakdown regression estimators, such as the LTS-, S-, MM-, and τ-estimators [1], [12], only a minority of the rows of X may be contaminated. In [14], Rousseuw and Van den Bossche state that the outlying rows paradigm is no longer sufficient for modern high-dimensional data sets. It often happens that most data cells (entries) in a row are regular and just a few of them are anomalous.
The case that independent cells of X are outliers is referred to as the independent contamination model (ICM) [15], [16], [17]. Only recently, the first cellwise robust regression estimation methods have been developed [16], [17]. Developing new cellwise robust estimators is essential for solving many real-world problems. For example, the estimation of the spatio-temporal emissions of a pollutant, given noisy observations, can be formulated as a linear inverse problem with the help of an atmospheric dispersion model [7]. The data of the European Tracer Experiment (ETEX) which was conducted in Monterfil, Brittany in 1994, where Perfluorocarbon (PFC) tracers were released into the atmosphere, for instance, contains both cellwise and rowwise outliers.
Additionally to the robustness considerations, atmospheric inverse problems, like many other problems in signal processing, require finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. For example, handling large data sets in terms of model interpretation, including the case where the number of explanatory variables p is larger than the sample size n, requires penalized estimators, such as the classical least absolute shrinkage and selection operator (Lasso) [18]with .
Many other regularizations have been proposed [19], [20], [21], [22], [23]. In this paper the focus lies on Lasso estimation, to select a robust and interpretable model in high-dimensional settings. Zou [24] showed that the Lasso variable selection can be inconsistent, so that the oracle properties do not hold and proposed the adaptive Lassowhere (γ > 0) are non-negative weights depending on which is a -consistent estimator of β. For a thorough discussion on consistency of estimators, see [25].
Just like the LSE, the Lasso and the adaptive Lasso rely on the Gaussian noise assumption and are sensitive to outliers. In recent years, some robust and regularized approaches have been proposed that replace the penalized square objective function by a penalized bounded objective function [8], [10], [26], [27], [28]. These methods, however, again, rely on the THCM.
Especially, in cases where the number of predictors p exceeds the number of rows n, it becomes more and more likely that a few highly contaminated predictors force the THCM-based estimators to flag all data points as outliers, which makes it impossible to draw any inferences from the data. This scenario is illustrated in Fig. 1, where a predictor matrix with n data points and p variables is depicted. In this example, the first predictor contains two outliers and the second predictor contains three outliers. For classical robust estimators such as the least trimmed squares estimator (LTS-estimator) [29] or the MM-estimator [12] and their sparsity inducing counterparts namely the Sparse LTS-estimator [27] and the Sparse MM-estimator [28], the depicted predictor matrix appears to be fully contaminated. The reason for this is that for these estimators a single contaminated cell in the predictor matrix leads to flagging the whole corresponding data point as an outlier which is then removed (LTS-estimator and Sparse LTS-estimator) or downweighted (MM-estimator and Sparse MM-estimator). This is depicted in the lower part of Fig. 1.
Such scenarios require methods that flag highly contaminated predictors and decrease their likelihood of being selected as active variables by Lasso type estimators.
Original Contributions: We propose and analyze a robustly weighted and adaptive Lasso type regularization term, which takes into account cellwise outliers for model selection. The proposed regularization term is integrated into the objective function of the MM-estimator, which yields the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). The RWAL penalty can easily be integrated into the objective function of other robust estimators. A performance comparison to existing robust Lasso estimators is provided using Monte Carlo experiments. Further, a real-data application of estimating the sparse non-negative spatio-temporal emissions of a pollutant, given noisy observations and an imprecisely estimated ill-conditioned and sparse dispersion model X, is considered. This example contains both cellwise and rowwise outliers.
Notation: Scalars are denoted by lowercase letters, e.g., x, column vectors by bold-faced lowercase letters, e.g. x, matrices by bold-faced uppercase letters, e.g. X, sets are denoted by calligraphic letters, e.g. with associated cardinality . The jth column of a matrix X is denoted by xj while (x)i: j denotes the vector that contains the entries i to j of vector x. The ith element of vector x is denoted by xi, is the p-dimensional identity-matrix, 0p is the p-dimensional all-zeros vector and diag(x) forms a matrix that contains the entries of x as its diagonal. refers to the estimator (or estimate) of the parameter vector β, ( · )⊤ is the transpose operator. The derivative of a function f with respect to its argument is abbreviated by f′. P(X) is the probability of event X. Bin(1, ϵ) denotes the binomial distribution with one trial and a success probability of ϵ. Convergence to the normal distribution with mean vector μ and covariance matrix Σ is denoted by .
Organization: Section 2 discusses the Tukey-Huber and Independent Contamination models and motivates the use of cellwise robust methods. Section 3 introduces the proposed estimator and provides algorithms to compute the estimates. Section 4 provides numerical experiments, while Section 5 contains a real-data application of source estimation for an atmospheric inverse problem. Finally, Section 6 concludes the paper with a brief outlook on future work.
Section snippets
Tukey-Huber and independent contamination models
In this Section, we will review the Tukey-Huber and Independent Contamination models and discuss their advantages and drawbacks in modeling outliers in the data.
Proposed method
In this section, we introduce a new method called the MM-Robust Weighted Adaptive Lasso (MM-RWAL).
Simulation setup
Two different Monte Carlo studies are conducted, to assess the performance of the proposed MM-RWAL.
Scenario 1: p > n, correlated predictors, cellwise outliers
A setup with predictors and observations is considered. The regression parameters are defined by while . Correlated predictors xj, are generated by sampling from a multivariate zero mean Gaussian distribution with covariance matrix . The errors ui are zero mean
A Real Data example of source estimation for an atmospheric inverse problem
Quantifying the emissions of a pollutant into the atmosphere is essential, for example, in the case of nuclear power plant accidents, volcano eruptions, or to track the releases of greenhouse gases. In this paper, we apply penalized robust estimation to determine the temporal releases of the particles of the European Tracer Experiment (ETEX) at the source location. During the ETEX experiment tracers (perfluorocarbons) were released into the atmosphere in Monterfil, Brittany in 1994. Hourly
Conclusion
The problem of finding sparse solutions to under-determined, or ill-conditioned, linear regression problems that are contaminated by cellwise and rowwise outliers was investigated. We introduced and analyzed a robustly weighted and adaptive Lasso type regularization term and integrated it into the objective function of the MM-estimator, resulting in the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL). The regularization term takes into account cellwise outlyingness in the regression matrix
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The authors would like to thank Marta Martinez-Camara and Martin Vetterli for making us aware of the ETEX experiment and for many interesting discussions on this application.
The work of Jasin Machkour has been funded by the German Research Foundation (DFG) under grant number 425884435.
The work of Michael Muma has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY centre and is supported by the ‘Athene Young Investigator Programme’ of Technische Universität Darmstadt,
References (37)
- et al.
Robust regression estimation and inference in the presence of cellwise and casewise contamination
Comput. Stat. Data Anal.
(2016) - et al.
Nonconvex penalized reduced rank regression and its oracle properties in high dimensions
J. Multivar. Anal.
(2016) - et al.
Robust and sparse estimators for linear regression models
Comp. Statist. Data Anal.
(2017) - et al.
Algorithms for projection-pursuit robust principal component analysis
Chemometr. Intell. Lab. Syst.
(2007) - et al.
Robust estimation in signal processing: atutorial-style treatment of fundamental concepts
IEEE Signal Process. Mag
(2012) - et al.
Robust Statistics for Signal Processing
(2018) - et al.
Sparse subspace clustering: algorithm, theory, and applications
IEEE Trans. Pattern Anal. Mach. Intell.
(2013) - et al.
Robust recovery of subspace structures by low-rank representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2013) - et al.
Robust estimates of covariance matrices in the large dimensional regime
IEEE Trans. Inf. Theory
(2014) - et al.
Generalized robust shrinkage estimator and its application to STAP detection problem
IEEE Trans. Signal Process.
(2014)
A robust method for inverse transport modeling of atmospheric emissions using blind outlier detection
Geosci. Model Dev.
A new robust and efficient estimator for ill-conditioned linear inverse problems with outliers
Proc. IEEE Int. Conf. Acoust. Speech Signal Process
Outlier-robust greedy pursuit algorithms in ℓp-space for sparse approximation
IEEE Trans. Signal Process.
Adaptive lasso based on joint M-estimation of regression and scale
Proc. Eur. Signal Process. Conf.
Robust Statistics: The Approach Based on the Influence Function
Robust Statistics
Robust Statistics
Cited by (6)
Robust statistical methods for high-dimensional data, with applications in tribology
2023, Analytica Chimica ActaSparse regression for large data sets with outliers
2022, European Journal of Operational ResearchCitation Excerpt :However, the former cannot be computed in high-dimensional settings; the latter is numerically unstable and does not give sparse regression coefficients. Another recent proposal was made by Machkour, Alt, Muma, and Zoubir (2017), Machkour, Muma, Alt, and Zoubir (2020) who define a measure of outlyingness for each row and column of the data matrix and combine this into an outlyingness score for each cell. Their method is computationally demanding, the proposed sparse shooting S, in contrast, does not suffer from this drawback.
Implementation of adaptive lasso regression based on multiple Theil-Sen Estimators using differential evolution algorithm with heavy tailed errors†
2022, Journal of the National Science Foundation of Sri Lanka