General matching quantiles M-estimation

doi:10.1016/j.csda.2020.106941

Computational Statistics & Data Analysis

Volume 147, July 2020, 106941

https://doi.org/10.1016/j.csda.2020.106941 Get rights and content

Abstract

Matching quantiles estimation (MQE) is a useful technique that allows one to find a linear combination of a set of random variables that matches the distribution of a target random variable. Since it is based on ordinary least-squares (OLS), it may be sensitive to outlier observations of the target random variable. A general matching quantiles M-estimation (MQME) method is thus proposed, which is resistant to outlier observations of the target random variable. Given that in most applications, the number of variables $p$ may be large, a ‘sparse’ representation is highly desirable. The MQME is combined with the adaptive Lasso penalty so it can select informative variables. An iterative algorithm based on M-estimation is developed to compute MQME. The proposed matching quantiles M-estimate is consistent, just like the MQE. Extensive simulations are provided, in which efficient finite-sample performance of the new method is demonstrated. In addition, an illustrative real case study is presented.

Introduction

The matching quantiles estimation (MQE) method was first proposed by Sgouropoulos et al. (2015) as a way to address the problem of estimating representative portfolios for backtesting counterparty credit risks. The goal is to construct a representative portfolio such that its distribution matches that of the total counterparty portfolio. Instead of matching the two distributions directly, MQE aims to minimize the mean-squared difference between the quantiles of the two distributions across all levels.

The potential usefulness of matching quantiles methods extends beyond estimating representative portfolios. Sgouropoulos et al. (2015) pointed out that the MQE could also be applied to other (non-finance-related) contexts, such as in atmospheric sciences where measurements are not necessarily taken simultaneously. The idea of matching quantiles has been explored in other applications. Dominicy and Veredas (2013) introduced a method of simulated quantiles and used it to analyze the twenty-two financial indexes. Li et al. (2010) developed an equidistant quantile-matching (EQM) method for bias correction of monthly precipitation and temperature fields data published by the Intergovernmental Panel on Climate Change and reported that the method was more efficient than traditional direct distribution function mapping. More recent work by Srivastav et al. (2014) presents a methodology based on EQM for updating the intensity–duration–frequency (IDF) curves under climate change.

Although MQE achieves high goodness (see Eq. (4.3) for the definition of a measure of goodness) in matching distributions, it is sensitive to outliers due to the fact that it involves optimizing an ordinary least squares objective; thus, the presence of outliers will negatively impact its performance. From a number of classic works in literature, one learns that there are statistical procedures one can potentially use to modify the MQE that can minimize its sensitivity to outliers. M-estimation based procedures can play important and complementary roles in forming a more robust MQE. M-estimation, a maximum likelihood type estimation, was originally proposed in Huber (1964). Since a proper choice of this function can result in robustness against outliers, M-estimation has received considerable attention in the literature and has been applied to many fields of study. Some recent examples include (1) Lambert-Lacroix and Zwald (2011) proposed an M-estimation by combining Huber’s criterion and the Lasso penalty, which is resistant to heavy-tailed errors or outliers in observations of the response variable; (2) Zhang et al. (2016) applied an adaptive Huber’s M-estimation to the cubature Kalman filter to handle abnormal measurement noise, which resulted in advantages such as increased estimation accuracy, outlier-robustness, and reliability, as demonstrated in simulation studies; (3) Ollila et al. (2016) introduced two penalized M-estimation methods for the problem of joint estimation of group covariance matrices.

In this paper, we propose a general enhancement of MQE by replacing the OLS estimation with M-estimation. We show that in addition to being resistant to outliers, the proposed matching quantiles M-estimate, like MQE, is consistent. The proposed MQME can handle situations when the sample size $n$ and the number of candidate variables $p$ is big, but the number of relevant variables is small. This is common in many modern problems. This suggests that a ‘sparse’ matching quantiles estimate is highly desirable. Therefore, a sparse MQME is also developed by combining MQME with the adaptive Lasso penalty. As with the original MQME, we expect the ‘sparse’ variant to also be robust to outlier observations.

The rest of this paper is organized as follows. In Section 2, we introduce the MQME method. We discuss its theoretical properties in Section 3. Numerical experiments of varying designs are explored in Section 4, followed by a real case study of the stock market index in Hong Kong during the period of 2013–2016 in Section 5. Finally, we draw our conclusions in Section 6. All proofs to any presented theoretical results can be found in Appendix.

The following notations will be used in subsequent sections:

•
$R^{p}$ denotes the real $p$ -dimensional space.
•
$X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}$ is a (column) vector of $p$ random variables. ${X_{1}, \dots, X_{n}}$ is the set of the $n$ observations of $X$ . (Note that the boldface is used only if $p > 1$ .)
•
$Y$ is the target random variable. ${Y_{1}, Y_{2}, \dots, Y_{n}}$ is the set of the $n$ observations of $Y$ .
•
$β = {(β_{1}, β_{2}, \dots, β_{p})}^{T}$ is a $p$ -dimensional regression coefficient vector.
•
For some generic random variable $ξ$ , $L (ξ)$ , $F_{ξ} (\cdot)$ and $f_{ξ} (\cdot)$ respectively denote its distribution, distribution function and probability density function. In addition, $Q_{ξ} (α)$ denotes its $α$ th quantile, i.e., $P {ξ \leq Q_{ξ} (α)} = α, for α \in [0, 1]$ .
•
For some generic collection of $n$ samples ${ξ_{1}, ξ_{2}, \dots, ξ_{n}}$ of a random variable $ξ$ , let the corresponding order statistics be $ξ_{(1)} \leq ξ_{(2)} \leq \dots \leq ξ_{(n)}$ . Let $F_{n, ξ} (x) = n^{- 1} \sum_{i = 1}^{n} I {ξ_{i} \leq x}$ be the empirical distribution function, and denote its $α$ th quantile by $Q_{n, ξ} (α)$ , for $α \in [0, 1]$ .
•
Denote the convergence in probability by ‘ $\overset{p}{⟶}$ ’ and the convergence almost surely by ‘ $\overset{a . s .}{⟶}$ ’.

Section snippets

The methods

In this section, we provide details of the proposed MQME method. In Section 2.1, we formally introduce MQME. In Section 2.2, we propose the sparse MQME. In Section 2.3, we present an iterative algorithm for computing both MQME and sparse MQME. In Section 2.4, we discuss the selection of tuning parameters.

Theoretical properties

In this section, the convergence of the aforementioned iterative algorithm, and the statistical properties of the matching quantiles M-estimate are presented. Before proceeding, we make the following assumptions.

(A1) $ρ (x)$ is a convex function satisfying that $ρ (x) \geq ρ (0) = 0$ , and is Lipschitz continuous, that is, there exists a constant $M \geq 0$ , such that for any $x_{1}, x_{2} \in R$ , $| ρ (x_{1}) - ρ (x_{2}) | \leq M | x_{1} - x_{2} |$ .

(A2) For any $0 < τ_{0} < τ_{1} < 1 ∕ 2$ , there exists $Ω_{n}$ such that (i) ${inf}_{Q_{ξ} (α) \in Ω_{n}} f_{ξ} (Q_{ξ} (α)) = n^{- (τ_{1} - τ_{0})}$ ; (ii) ${sup}_{Q_{ξ} (α) \in Ω_{n}} | f_{ξ}^{'} (Q_{ξ}$

Simulations study

The simulations are performed under different scenarios, without or with outliers. The $L_{2}$ , $L_{1}$ , and Huber discrepancy functions are chosen for comparison purpose. We remark that the tuning parameters are chosen by using five-fold cross-validation. For convenience, the MQME method based on Huber $ρ_{c}$ ( $c > 0$ ), $L_{1}$ , and $L_{2}$ discrepancy functions are abbreviated as HUBER, LAD, and LS, respectively in this section.

A real case study

In this section, a real example is considered for investigating the performance of (sparse) MQME. Our purpose is to assemble a representative portfolio with different securities that matches various characteristics of a benchmark index. The function $ρ (\cdot)$ is chosen to be $L_{2}$ , $L_{1}$ or Huber discrepancy function. The tuning parameters $c$ and $λ$ are chosen via five-fold cross-validation.

We apply the proposed method to the stock market index in Hong Kong during the period of 2013–2016. These data are

Conclusions

In this paper, we extend the MQE to the more general MQME that is resistant to outlier observations. MQME integrated with the adaptive Lasso penalty encourages sparsity in the estimate. Since MQME does not admit an explicit solution, we propose an iterative algorithm to solve it. The consistency of matching quantiles M-estimate are investigated based on the assumptions that are weaker than those of MQE made in Sgouropoulos et al. (2015). We demonstrate the effectiveness of MQME through

Acknowledgments

This work is supported by Natural Sciences and Engineering Research Council of Canada (RGPIN-2017-05720). The first author also gratefully acknowledges the financial support from the China Scholarship Council (Grant No. 201506180073). The authors would like to thank the associate editor and the anonymous reviewer for the critical comments and constructive suggestions which have led to the improvement of this article.

References (22)

ChenK. et al.
Robust regularized extreme learning machine for regression using iteratively reweighted least squares
Neurocomputing
(2017)
DominicyY. et al.
The method of simulated quantiles
J. Econometrics
(2013)
ZhangC. et al.
Adaptive m-estimation for robust cubature kalman filtering
CantoniE. et al.
Robust inference for generalized linear models
J. Amer. Statist. Assoc.
(2001)
ChiE.M.
M-estimation in cross-over trials
Biometrics
(1994)
El KarouiN. et al.
On robust regression with high-dimensional predictors
Proc. Natl. Acad. Sci.
(2013)
FanJ. et al.
Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2017)
HuberP.J.
Robust estimation of a location parameter
Ann. Math. Stat.
(1964)
HuberP.J.
Robust regression: asymptotics, conjectures and Monte Carlo
Ann. Statist.
(1973)
HuberP.J.
Robust Statistics 1981
(1981)

JiangY. et al.

Robust estimation using modified Huber’s functions with new tails

Technometrics

(2019)

Cited by (2)

Matching a discrete distribution by Poisson matching quantiles estimation
2024, Journal of Applied Statistics
Matching distributions for survival data
2022, Canadian Journal of Statistics

View full text

General matching quantiles M-estimation

Abstract

Introduction

Section snippets

The methods

Theoretical properties

Simulations study

A real case study

Conclusions

Acknowledgments

Neurocomputing

J. Econometrics

Robust inference for generalized linear models

J. Amer. Statist. Assoc.

M-estimation in cross-over trials

Biometrics

On robust regression with high-dimensional predictors

Proc. Natl. Acad. Sci.

Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions

J. R. Stat. Soc. Ser. B Stat. Methodol.

Robust estimation of a location parameter

Ann. Math. Stat.

Robust regression: asymptotics, conjectures and Monte Carlo

Ann. Statist.

Robust Statistics 1981

Robust estimation using modified Huber’s functions with new tails

Technometrics