An extension of the Gauss–Newton algorithm for estimation under asymmetric loss
Introduction
The importance of minimum mean-squared error as an optimality criterion in forecasting, which implies the use of the conditional mean as an optimal forecast, is being increasedly challenged by the use of alternative loss functions in evaluating predictions. Sets of sufficient conditions, under which the conditional mean delivers optimal predictions for other loss functions, are given for instance in Granger (1999), but these are rather the exception. In addition to that, the way the assumed model is estimated may play a decisive role in the performance of the forecasting procedure, as pointed out by Weiss and Andersen (1984). As a solution, Granger (1969) and Weiss (1996) suggest that the estimation of the model parameters should be done by using the same loss function as in forecasting; consequently, one has to minimize an objective function based on the mentioned loss function.
Analytic solutions to this minimum problem rarely exist. Then, minimization is handled by a numerical method. Naturally, each method has its advantages and disadvantages. For a survey of optimization methods used in econometrics, see Davidson (2000, Section 9.2) or Judge et al. (1985, Appendix B).
Used for fitting nonlinear least-squares regressions, the Gauss–Newton (GN) algorithm is a Newton-like method, for which the Hessian matrix of the objective function is approximated in such a way that the optimization procedure can be interpreted as a succession of linear regressions; furthermore, the approximated Hessian is always positive definite, making the GN method a popular choice in empirical work. Wedderburn (1974) points out that the GN algorithm also delivers Maximum-likelihood (ML) estimates, if innovations are assumed gaussian. The properties of nonlinear least-squares estimators are reviewed, among others, by Mittelhammer et al. (2000, Section 8).
Several extensions have been proposed for more general ML-estimation problems. Bard (1974, pp. 97–99) gives a generalization which allows for dependence structures in the innovations and for a departure from normality. Green (1984) indicates that maximization of a likelihood function can also be written as a succession of weighted linear regressions; his approach, “iteratively reweighted least squares” (IRLS), uses directly the properties of the likelihood and approximates the Hessian with the information matrix. A prominent case covered by IRLS is the BHHH algorithm (Berndt et al., 1974), for which the Hessian is approximated by the average outer product of gradients, due to the information matrix equality. More recently, an iteratively reweighted scheme was proposed by Basu and Lindsay (2004) for minimum distance estimation.
It can be shown that not every loss function can be derived from a likelihood function in models with additive zero-mean innovations. Besides, if forecasting, the loss function to be used is externally imposed by the beneficiary of the forecast and not derived from the assumed statistical model. When using it for inference, ML arguments (like those used in IRLS) are invalidated. To our knowledge, there is no optimization method that accounts for the special structure of a minimum aggregated loss problem. Therefore, we give an extension of the GN algorithm for a class of more general loss functions.
The remainder of the paper is structured as follows: in Section 2, we describe how asymmetric loss estimation works for location and location/scale processes, with particular attention to the effects of distribution misspecification. Then, while preserving the succession-of-linear-regressions interpretation of GN, an optimization procedure for the class of loss functions with continuous second derivative is given. We also show the use of approximating loss functions to be asymptotically valid, thus extending the method to loss functions that do not exhibit the desired degree of smoothness. In Section 4, the proposed method is studied for linear and nonlinear models, as well as for several different loss functions and Section 5 concludes.
Section snippets
Estimation under asymmetric loss
Let , denote the process to be forecast. The optimal forecast, or optimal predictor, under the imposed loss function, minimizes the expected loss, or risk, of the prediction of Y at time , given the information set available. Typically, the information set at time consists of lagged values of the process.
In order to obtain optimal forecasts in the general case, one should model the conditional density for each where a forecast is desired. This, however, is not always feasible.
The extended GN method
While the previous considerations did not restrict the loss function to a particular class, this will not be maintained in the following discussion. For the proposed extension (EGN) to remain in the Newton family, we require the loss function to possess continuous second derivative and the optimal predictor to possess second partial derivatives w.r.t. through .
The kth component of the gradient of the loss function in the nth iteration is
Simulations
For the simulation experiments in this section, we follow the outline of Example 1. All simulations are carried out by means of GAUSS, kernel rev. 5.0.25, running on a WindowsXP computer with an AMD XP2600+ CPU and 3 Gb RAM.
The data is generated according to different autoregressive models, all having starting values 0. All innovations are standard normal. We use the linex loss and the double linex loss, given by for positive . For convenience, the latter is taken
Conclusions
A brief review and some refinements of estimation under asymmetric loss are given, related to robustness against misspecification of innovation distribution. Considering the optimization aspect, this paper proposes an extension of the GN algorithm for the class of loss functions with continuous second derivative. Approximation results for smoothed loss functions are derived, to make this extension applicable to non-smooth loss functions. The usefulness of the proposed methods is exemplified by
Acknowledgements
I am grateful to Uwe Hassler, Adina-Ioana Tarcolea and three anonymous referees for helping improve this paper.
References (26)
- et al.
The iteratively reweighted estimating equation in minimum distance problems
Comput. Statist. Data Anal.
(2004) - et al.
Estimation of nonlinear time series with conditional heteroscedastic variances by iteratively weighted least squares
Comput. Statist. Data Anal.
(1997) Advanced Econometrics
(1985)Nonlinear Parameter Estimation
(1974)- et al.
Estimation and inference in nonlinear structural models
Ann. Econom. Soc. Meas.
(1974) - et al.
Further results on forecasting and model selection under asymmetric loss
J. Appl. Econom.
(1996) - et al.
Optimal prediction under asymmetric loss
Econometric Theory
(1997) Econometric Theory
(2000)- et al.
Numerical Methods for Unconstrained Optimization and Nonlinear Equations
(1996) - et al.
Comparing predictive accuracy
J. Bus. Econom. Statist.
(1995)
Practical Methods of Optimization
Prediction with a generalized cost of error function
Oper. Res. Quart.
Outline of forecast theory using generalized cost functions
Spanish Econom. Rev.
Cited by (5)
ROC curves for regression
2013, Pattern RecognitionCitation Excerpt :Second, when asymmetry is important the operating condition may be unknown on deployment time.6 Third, keeping several models and choosing the best one depending on the operating condition seems a reasonable way to adapt predictions to the new context, but there is always the alternative approach of retraining the model (provided that we can use specialised regression techniques for each cost function [11,39]). This is known as the reframing/retraining dilemma.
Joint forecasts of Dow Jones stocks under general multivariate loss function
2010, Computational Statistics and Data AnalysisCitation Excerpt :After specifying its conditional covariance matrix, we have to specify the shape of the conditional distribution, either by using a parametric model or by relying in a semi-parametric manner on the GARCH residuals. The semi-parametric approach is actually the two-step procedure suggested by Granger (1969); see Demetrescu (2006) for its use with univariate GARCH. So the choice here, for both estimation and forecasting, is between the imprecision of a model (by its very nature only an approximation) and the imprecision of solving (1) based on a sample.
Predictive regressions under asymmetric loss: Factor augmentation and model selection
2019, International Journal of ForecastingOptimal forecast intervals under asymmetric loss
2007, Journal of Forecasting