An extension of the Gauss–Newton algorithm for estimation under asymmetric loss

doi:10.1016/j.csda.2004.08.007

Computational Statistics & Data Analysis

Volume 50, Issue 2, 30 January 2006, Pages 379-401

https://doi.org/10.1016/j.csda.2004.08.007 Get rights and content

Abstract

Estimators obtained by the use of the relevant loss function lead to forecasts with good properties when the same loss function is used to evaluate the forecasts. The provided extension of the Gauss–Newton algorithm is tailored for the associated optimization problem. Due to an approximation of the second derivative of the loss function, it can be viewed as a succession of linear generalized least-squares regressions and is easy to implement. Smoothing loss functions which do not possess derivatives has asymptotic validity. The extension performs well compared to the Newton (with exact Hessian) and BFGS algorithms in a Monte Carlo study employing different loss functions and several autoregressive models.

Introduction

The importance of minimum mean-squared error as an optimality criterion in forecasting, which implies the use of the conditional mean as an optimal forecast, is being increasedly challenged by the use of alternative loss functions in evaluating predictions. Sets of sufficient conditions, under which the conditional mean delivers optimal predictions for other loss functions, are given for instance in Granger (1999), but these are rather the exception. In addition to that, the way the assumed model is estimated may play a decisive role in the performance of the forecasting procedure, as pointed out by Weiss and Andersen (1984). As a solution, Granger (1969) and Weiss (1996) suggest that the estimation of the model parameters should be done by using the same loss function as in forecasting; consequently, one has to minimize an objective function based on the mentioned loss function.

Analytic solutions to this minimum problem rarely exist. Then, minimization is handled by a numerical method. Naturally, each method has its advantages and disadvantages. For a survey of optimization methods used in econometrics, see Davidson (2000, Section 9.2) or Judge et al. (1985, Appendix B).

Used for fitting nonlinear least-squares regressions, the Gauss–Newton (GN) algorithm is a Newton-like method, for which the Hessian matrix of the objective function is approximated in such a way that the optimization procedure can be interpreted as a succession of linear regressions; furthermore, the approximated Hessian is always positive definite, making the GN method a popular choice in empirical work. Wedderburn (1974) points out that the GN algorithm also delivers Maximum-likelihood (ML) estimates, if innovations are assumed gaussian. The properties of nonlinear least-squares estimators are reviewed, among others, by Mittelhammer et al. (2000, Section 8).

Several extensions have been proposed for more general ML-estimation problems. Bard (1974, pp. 97–99) gives a generalization which allows for dependence structures in the innovations and for a departure from normality. Green (1984) indicates that maximization of a likelihood function can also be written as a succession of weighted linear regressions; his approach, “iteratively reweighted least squares” (IRLS), uses directly the properties of the likelihood and approximates the Hessian with the information matrix. A prominent case covered by IRLS is the BHHH algorithm (Berndt et al., 1974), for which the Hessian is approximated by the average outer product of gradients, due to the information matrix equality. More recently, an iteratively reweighted scheme was proposed by Basu and Lindsay (2004) for minimum distance estimation.

It can be shown that not every loss function can be derived from a likelihood function in models with additive zero-mean innovations. Besides, if forecasting, the loss function to be used is externally imposed by the beneficiary of the forecast and not derived from the assumed statistical model. When using it for inference, ML arguments (like those used in IRLS) are invalidated. To our knowledge, there is no optimization method that accounts for the special structure of a minimum aggregated loss problem. Therefore, we give an extension of the GN algorithm for a class of more general loss functions.

The remainder of the paper is structured as follows: in Section 2, we describe how asymmetric loss estimation works for location and location/scale processes, with particular attention to the effects of distribution misspecification. Then, while preserving the succession-of-linear-regressions interpretation of GN, an optimization procedure for the class of loss functions with continuous second derivative is given. We also show the use of approximating loss functions to be asymptotically valid, thus extending the method to loss functions that do not exhibit the desired degree of smoothness. In Section 4, the proposed method is studied for linear and nonlinear models, as well as for several different loss functions and Section 5 concludes.

Section snippets

Estimation under asymmetric loss

Let $Y_{t}, t \in R$ , denote the process to be forecast. The optimal forecast, or optimal predictor, under the imposed loss function, minimizes the expected loss, or risk, of the prediction of Y at time $t^{*}$ , given the information set available. Typically, the information set at time $t^{*}$ consists of lagged values of the process.

In order to obtain optimal forecasts in the general case, one should model the conditional density for each $t^{*}$ where a forecast is desired. This, however, is not always feasible.

The extended GN method

While the previous considerations did not restrict the loss function to a particular class, this will not be maintained in the following discussion. For the proposed extension (EGN) to remain in the Newton family, we require the loss function to possess continuous second derivative and the optimal predictor to possess second partial derivatives w.r.t. $θ_{1}$ through $θ_{K}$ .

The kth component of the gradient $\overset{\Rightarrow}{γ}$ of the loss function in the nth iteration is $γ_{k} ({\overset{\Rightarrow}{θ}}_{n}) = \sum_{t = 1}^{T} \frac{d L}{d u} \cdot \frac{\partial u}{\partial θ_{k}} = - \sum_{t = 1}^{T} {\frac{d L}{d u}|}_{y_{t} - f ({\overset{\Rightarrow}{x}}_{t}; {\overset{\Rightarrow}{θ}}_{n})} \cdot \frac{\partial f}{\partial θ}|$

Simulations

For the simulation experiments in this section, we follow the outline of Example 1. All simulations are carried out by means of GAUSS, kernel rev. 5.0.25, running on a WindowsXP computer with an AMD XP2600+ CPU and 3 Gb RAM.

The data is generated according to different autoregressive models, all having starting values 0. All innovations are standard normal. We use the linex loss and the double linex loss, given by $L_{dle} = e^{au} + e^{- bu} - (a - b) u,$ for positive $a, b$ . For convenience, the latter is taken

Conclusions

A brief review and some refinements of estimation under asymmetric loss are given, related to robustness against misspecification of innovation distribution. Considering the optimization aspect, this paper proposes an extension of the GN algorithm for the class of loss functions with continuous second derivative. Approximation results for smoothed loss functions are derived, to make this extension applicable to non-smooth loss functions. The usefulness of the proposed methods is exemplified by

Acknowledgements

I am grateful to Uwe Hassler, Adina-Ioana Tarcolea and three anonymous referees for helping improve this paper.

References (26)

A. Basu et al.
The iteratively reweighted estimating equation in minimum distance problems
Comput. Statist. Data Anal.
(2004)
T.K. Mak et al.
Estimation of nonlinear time series with conditional heteroscedastic variances by iteratively weighted least squares
Comput. Statist. Data Anal.
(1997)
T. Amemiya
Advanced Econometrics
(1985)
Y. Bard
Nonlinear Parameter Estimation
(1974)
E.K. Berndt et al.
Estimation and inference in nonlinear structural models
Ann. Econom. Soc. Meas.
(1974)
P.F. Christoffersen et al.
Further results on forecasting and model selection under asymmetric loss
J. Appl. Econom.
(1996)
P.F. Christoffersen et al.
Optimal prediction under asymmetric loss
Econometric Theory
(1997)
J. Davidson
Econometric Theory
(2000)
J.E. Dennis et al.
Numerical Methods for Unconstrained Optimization and Nonlinear Equations
(1996)
F.X. Diebold et al.
Comparing predictive accuracy
J. Bus. Econom. Statist.
(1995)

R. Fletcher

Practical Methods of Optimization

(1987)

C.W.J. Granger

Prediction with a generalized cost of error function

Oper. Res. Quart.

(1969)

C.W.J. Granger

Outline of forecast theory using generalized cost functions

Spanish Econom. Rev.

(1999)

Cited by (5)

ROC curves for regression
2013, Pattern Recognition
Citation Excerpt :
Second, when asymmetry is important the operating condition may be unknown on deployment time.6 Third, keeping several models and choosing the best one depending on the operating condition seems a reasonable way to adapt predictions to the new context, but there is always the alternative approach of retraining the model (provided that we can use specialised regression techniques for each cost function [11,39]). This is known as the reframing/retraining dilemma.
Receiver Operating Characteristic (ROC) analysis is one of the most popular tools for the visual assessment and understanding of classifier performance. In this paper we present a new representation of regression models in the so-called regression ROC (RROC) space. The basic idea is to represent over-estimation against under-estimation. The curves are just drawn by adjusting a shift, a constant that is added (or subtracted) to the predictions, and plays a similar role as a threshold in classification. From here, we develop the notions of optimal operating condition, convexity, dominance, and explore several evaluation metrics that can be shown graphically, such as the area over the RROC curve (AOC). In particular, we show a novel and significant result: the AOC is equivalent to the error variance. We illustrate the application of RROC curves to resource estimation, namely the estimation of software project effort.
Joint forecasts of Dow Jones stocks under general multivariate loss function
2010, Computational Statistics and Data Analysis
Citation Excerpt :
After specifying its conditional covariance matrix, we have to specify the shape of the conditional distribution, either by using a parametric model or by relying in a semi-parametric manner on the GARCH residuals. The semi-parametric approach is actually the two-step procedure suggested by Granger (1969); see Demetrescu (2006) for its use with univariate GARCH. So the choice here, for both estimation and forecasting, is between the imprecision of a model (by its very nature only an approximation) and the imprecision of solving (1) based on a sample.
When forecasts are assessed by a general loss (cost-of-error) function, the optimal point forecast is, in general, not the conditional mean, and depends on the conditional volatility—which, for stock returns, is time-varying. In order to provide forecasts of daily returns of 30 DJIA stocks under a general multivariate loss function, the following issues are addressed. We discuss what conditions define a multivariate loss function, and a simple class of such functions is proposed. Based on suitable combinations of univariate losses, the suggested multivariate functions are convenient for practical applications with many variables. To keep the computational aspect tractable, a flexible multivariate GARCH model is employed in estimating the conditional forecast distributions. The model easily copes with large number of series while allowing for skewness, fat tails, non-ellipticity, and tail dependence. Based on Engle’s DCC GARCH, it uses multivariate affine generalized hyperbolic distributions as conditional probability law, and the number of parameters to be estimated simultaneously does not depend on the number of series. The model is fitted using daily data from 2002 to 2007 (keeping data from 2008 for out-of-sample forecasts), and a bootstrap procedure is used to derive point forecasts under several multivariate loss functions of the proposed type.
Predictive regressions under asymmetric loss: Factor augmentation and model selection
2019, International Journal of Forecasting
This paper discusses the specifics of forecasting using factor-augmented predictive regressions under general loss functions. In line with the literature, we employ principal component analysis to extract factors from the set of predictors. In addition, we also extract information on the volatility of the series to be predicted, since the volatility is forecast-relevant under non-quadratic loss functions. We ensure asymptotic unbiasedness of the forecasts under the relevant loss by estimating the predictive regression through the minimization of the in-sample average loss. Finally, we select the most promising predictors for the series to be forecast by employing an information criterion that is tailored to the relevant loss. Using a large monthly data set for the US economy, we assess the proposed adjustments in a pseudo out-of-sample forecasting exercise for various variables. As expected, the use of estimation under the relevant loss is found to be effective. Using an additional volatility proxy as the predictor and conducting model selection that is tailored to the relevant loss function enhances the forecast performance significantly.
Predictive Regressions under Asymmetric Loss: Factor Augmentation and Model Selection
2018, SSRN
Optimal forecast intervals under asymmetric loss
2007, Journal of Forecasting

View full text

An extension of the Gauss–Newton algorithm for estimation under asymmetric loss

Abstract

Introduction

Section snippets

Estimation under asymmetric loss

The extended GN method

Simulations

Conclusions

Acknowledgements

Comput. Statist. Data Anal.

Comput. Statist. Data Anal.

Advanced Econometrics

Nonlinear Parameter Estimation

Estimation and inference in nonlinear structural models

Ann. Econom. Soc. Meas.

Further results on forecasting and model selection under asymmetric loss

J. Appl. Econom.

Optimal prediction under asymmetric loss

Econometric Theory

Econometric Theory

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

Comparing predictive accuracy

J. Bus. Econom. Statist.

Practical Methods of Optimization

Prediction with a generalized cost of error function

Oper. Res. Quart.

Outline of forecast theory using generalized cost functions

Spanish Econom. Rev.