Improving stock market volatility forecasts with complete subset linear and quantile HAR models
Introduction
Fluctuations of asset prices are essential for pricing traded assets. It follows that volatility forecasting models play a key role in risk management, investments and security valuation for all assets with uncertain future payoffs. The volatility of asset returns tends to be highly persistent. From early on, this has led researchers to use autoregressive volatility models (e.g., the autoregressive fractionally integrated moving average (ARFIMA) model of Granger & Joyeux, 1980) and the popular generalized autoregressive conditional heteroskedasticity (GARCH) class of latent volatility models proposed by Bollerslev (1986). Many variations of these models have been proposed.2 However, as high-frequency data have become more accessible, interest has switched to volatility models that use directly observable and measurable volatility, that is, realized volatility, defined as the sum of squared intraday returns (e.g., Andersen et al., 2001a, Andersen et al., 2001b, Barndorff-Nielsen and Shephard, 2002). Among these models, the heterogeneous autoregressive model (HAR) of Corsi (2009) has become the new standard ’to beat’. Compared to GARCH models, the HAR model is simple to estimate, as the realized volatility is explained by past daily, weekly and monthly historical volatility components within a linear regression framework. Still, the model accurately captures the long-memory property of volatility. HAR models are easy to estimate and interpret and to adjust by adding new explanatory variables. For example, Patton and Sheppard (2015) included realized semivariances, Andersen et al. (2012) disentangled realized volatility into its jump and continuous components, Bollerslev et al. (2016) exploited the measurement error of the volatility, and Corsi and Reno, 2009, Horpestad et al., 2019 included asymmetric returns.
Another strand of the literature reports forecasting improvements via machine learning techniques, such as gradient descent boosting, random forest, support vector (quantile) machine, artificial neural network, and deep learning, to predict market volatility, e.g., Baruník and Křehlík, 2016, Liu, 2019, Ramos-Pérez et al., 2019 and Xu et al. (2019). Our approach is on the border of the two strands of the literature, as we use a data-driven approach that is easily tractable and interpretable if necessary.3 Specifically, the complete subset approach is a combination of feature engineering and ensemble methods used in machine learning, while the HAR model is a standard econometric approach to predict market volatility.4
In this paper, we propose a volatility forecasting model, that makes use of realized volatility quantile forecasts to determine the expected volatility. Specifically, we first predict several quantiles of the volatility density; then, we aggregate quantiles of volatility into the expected (point estimate) volatility forecast. Although in our research, volatility density is not the goal but rather a tool towards achieving point forecasts of volatility, our research is related to the scant literature on volatility density forecasting. Berkowitz (2001) argued that volatility density forecasts are important requirements for stress testing of banks, calculating margin requirements and pricing financial derivatives. Volatility density forecasts are usually based on parametric models (e.g., Corsi et al., 2008). Our nonparametric approach to density forecasts is motivated by the work of Gaglianone and Lima (2012), who used quantile regression to predict the distribution of U.S. unemployment and survey forecasts. Quantile regression is appealing, as it does not require the assumption of a parametric form of the conditional distribution of the variable of interest. Unsurprisingly, several others have followed this line of thought. For example, Manzan and Zerom (2013) predicted the distribution of U.S. inflation, and Pedersen (2015) predicted the distribution of equity and bond market returns. Moreover, Meligkotsidou et al. (2019a) have used nonparametric density forecasts to predict the equity market premium.5
As tail events are, by definition, rare, predicting quantiles, specifically, the tails of a target variable’s distribution, might lead to large forecast errors, which may deteriorate the accuracy of the predictions created from quantile forecasts. To reduce forecast errors, Meligkotsidou et al. (2019a) adapted the idea of complete subset linear regression (CSLR) of Elliott et al. (2013). CSLR forecasts the target variable by aggregating forecasts using all model specifications that are possible given a set of explanatory variables and a number of admissible independent variables . For example, given a linear regression framework and potential explanatory variables, one could create forecasts using models with one independent variable , forecasts with , with and with , i.e. . These forecasts can then be combined via a suitable function to obtain the point forecast of interest. Meligkotsidou et al. (2019a) adapted this approach for quantile regression, leading to complete subset quantile regression (CSQR), which we also exploit in this study. Finally, Meligkotsidou et al. (2019b) predicted the monthly level of the U.S. S&P 500 market index volatility using the CSQR approach by expanding a first-order autoregressive model with macroeconomic variables.
We contribute to the volatility literature and extend the previous studies by combining standard HAR volatility models and the CSLR (linear) and CSQR (quantile) approaches into the HAR-CSLR and HAR-CSQR volatility models. The accuracy of HAR-CSLR and HAR-CSQR is empirically tested on a sample of market indices of four large markets, the U.S. S&P 500, Japan’s NIKKEI 225, China’s SSEC Composite and Europe’s STOXX 50. We find that HAR-CSLR and HAR-CSQR models tend to outperform popular benchmark HAR models. Meligkotsidou et al. (2019b) is most closely related to this research; however, we differ in several aspects.
First, as a baseline model, we rely on several popular HAR models instead of the simple autoregressive (AR) model specification (as in Meligkotsidou et al., 2019b). We therefore contribute to the extensive literature on various HAR model types (e.g., Degiannakis et al., 2020, Patton and Sheppard, 2015). Among four standard HAR models, we cannot find (given our sample) one that performs best for all market indices, loss functions and forecast horizons, yet the approach of Patton and Sheppard (2015) tends to consistently perform well.
Second, Meligkotsidou et al. (2019b) studied monthly levels of a volatility.6 However, most of the existing volatility studies are concerned with day-ahead volatility forecasts, as shorter forecast horizons are more relevant for managing positions of risky assets. We therefore study the accuracy of the HAR-CSLR and HAR-CSQR models for 1,2, …22-day-ahead forecasts. This way we provide evidence on the usefulness of HAR-CSLR and HAR-CSQR forecasts for daily forecast periods as well as for periods leading to a monthly level of volatility that corresponds to approximately trading days. We are therefore able to evaluate whether HAR-CSLR and HAR-CSQR have greater merit for shorter or longer forecasting horizons. We find, that when HAR-CSLR and HAR-CSQR models are evaluated via the mean square error loss function, they tend to outperform benchmark HAR models for longer forecast horizons. On the other hand, the asymmetric loss function tends to suggest that HAR-CSLR and HAR-CSQR work best for forecast horizons of up to nine days.
Third, Meligkotsidou et al., 2019a, Meligkotsidou et al., 2019b relied on macroeconomic data. While macroeconomic variables might be useful even when modeling short-term volatility (e.g., Lyócsa et al. 2020b), in such settings, it is difficult to evaluate the role CSLR and CSQR play in volatility forecasting, as part of the increased accuracy might be due to the use of macroeconomic variables and not the modeling approach per se. Our approach is different in that we use only data that can be retrieved from the price series, and we also model realized variance directly.7
Fourth, previous studies have employed three- or five-quantile aggregation, which has left an open question of whether aggregating across more quantiles is beneficial in practice. This factor has important practical implications, as the need to predict multiple quantiles in a CSQR framework increases the computation time. We therefore present a seven-quantile method and empirically compare three aggregation techniques (three-, five- and seven-quantile methods). As our results suggest that the five- and seven-quantile methods perform similarly, we recommend the use of the more parsimonious five-quantile method.
The remainder of this paper is organized as follows. In Section 2, we describe our sample and realized measures. In Section 3, we outline the benchmark models and the HAR-CSLR and HAR-CSQR models, along with the forecasting procedure and forecast evaluation framework. Section 4 reports our results, and Section 5 concludes and highlights further lines of research.
Section snippets
Data sources
To demonstrate the HAR-CSLR and HAR-CSQR models, we use data on four market indices corresponding to the largest markets, namely, the S&P 500 (U.S.), STOXX 50 (Europe), NIKKEI 225 (Japan), and SSEC Composite (China). Our sample starts in January and ends in March . The four indices track the development of the largest stock markets in the world, which given the most recent data, correspond to approximately two thirds of the total market capitalization of the world.8
Standard predictive HAR models
The standard HAR-RV model of Corsi (2009) predicts volatility using a set of three volatility components: the average level of volatility over the past one (, daily), five (, weekly) or twenty-two (, monthly) trading days: We next use the HAR-CJ model specification (similar to that of Andersen et al., 2007, Degiannakis et al., 2020, Sévi, 2014), which considers the continuous and jump () volatility components. Specifically, , which is the
Baseline results
The summary of the realized measures reported in Table 1 and the visualization of the series in Fig. 2, Fig. 3 show well-known stylized facts of the volatility series. Realized volatility is skewed to the right and highly persistent. Even at the 22nd lag, the autocorrelation coefficient is 0.28 for the S&P and 0.09 for the NIKKEI . Moreover, the continuous component () shows greater persistence, and the signed jumps () show almost no persistence. Also notable is the high
Conclusion
We extend the heterogeneous autoregressive (HAR) model of Corsi (2009) and its recent extensions (e.g., Andersen et al. 2012; Patton and Sheppard 2015) via the complete subset regression of Elliott et al. (2013) (HAR-CSLR model) and the complete subset quantile regression of (Meligkotsidou et al., 2019a, Meligkotsidou et al., 2019b) (HAR-CSQR model). The HAR-CSLR and HAR-CSQR models are empirically tested to predict the 1- to 22-day-ahead realized variance of four major market indices, the
CRediT authorship contribution statement
Štefan Lyócsa: Software, Conceptualization, Methodology, Writing - original draft, Software. Daniel Stašek: Data curation, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (54)
- et al.
The distribution of realized stock return volatility
Journal of financial economics
(2001) - et al.
Jump-robust volatility estimation using nearest neighbor truncation
Journal of Econometrics
(2012) - et al.
Combining high frequency data with non-linear models for forecasting energy market volatility
Expert Systems with Applications
(2016) Generalized autoregressive conditional heteroskedasticity
Journal of Econometrics
(1986)- et al.
Exploiting the errors: A simple approach for improved volatility forecasting
Journal of Econometrics
(2016) - et al.
Complete subset regressions
Journal of Econometrics
(2013) - et al.
Asymmetric volatility in equity markets around the world
The North American Journal of Economics and Finance
(2019) - et al.
Simple robust averages of forecasts: Some empirical results
International Journal of Forecasting
(2008) Novel volatility forecasting using deep learning–long short term memory recurrent neural networks
Expert Systems with Applications
(2019)- et al.
Forecasting the chinese stock volatility across global stock markets
Physica A. Statistical Mechanics and its Applications
(2019)
Fear of the coronavirus and the stock markets
Finance Research Letters
Impact of macroeconomic news, regulation and hacking exchange markets on the volatility of bitcoin
Journal of Economic Dynamics and Control
Volatility forecasting of non-ferrous metal futures: Covariances, covariates or combinations?
Journal of International Financial Markets, Institutions and Money
Predicting risk in energy markets: Low-frequency data still matter
Applied Energy
Are low-frequency data really uninformative? A forecasting combination perspective
The North American Journal of Economics and Finance
Are macroeconomic variables useful for forecasting the distribution of US inflation?
International Journal of Forecasting
Properties of range-based volatility estimators
International Review of Financial Analysis
Volatility forecast comparison using imperfect volatility proxies
Journal of Econometrics
Optimal combinations of realised volatility estimators
International Journal of Forecasting
Forecasting volatility with a stacked model based on a hybridized artificial neural network
Expert Systems with Applications
Forecasting the volatility of crude oil futures using intraday data
European Journal of Operational Research
Realised variance forecasting under Box–Cox transformations
International Journal of Forecasting
A novel UMIDAS–SVQR model with mixed frequency investor sentiment for predicting stock market volatility
Expert Systems with Applications
Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility
The Review of Economics and Statistics
The distribution of realized exchange rate volatility
Journal of the American Statistical Association
Limit theorems for bipower variation in financial econometrics
Econometric Theory
Designing realized kernels to measure the ex post variation of equity prices in the presence of noise
Econometrica
Cited by (14)
Volatility forecasting on China's oil futures: New evidence from interpretable ensemble boosting trees
2024, International Review of Economics and FinanceForecasting of clean energy market volatility: The role of oil and the technology sector
2024, Energy EconomicsForecasting day-ahead expected shortfall on the EUR/USD exchange rate: The (I)relevance of implied volatility
2024, International Journal of ForecastingComplete subset averaging methods in corporate bond return prediction
2023, Finance Research LettersA high-frequency approach to VaR measures and forecasts based on the HAR-QREG model with jumps
2022, Physica A: Statistical Mechanics and its ApplicationsForecasting stock volatility and value-at-risk based on temporal convolutional networks
2022, Expert Systems with ApplicationsCitation Excerpt :Broadly speaking, these techniques can be categorized into three classes, that is, conventional GARCH-type models (Bauwens et al., 2006; Bollerslev, 1986; Bollerslev et al., 1992; Engle, 1982), stochastic volatility models (Jacquier et al., 2004; Kastner et al., 2017; Taylor, 1994) and the methods based on machine learning (Gamboa, 2017; Hou, 2013; Liu, 2019; Yu & Li, 2018). As high-frequency data become more accessible, the heterogeneous autoregressive model (HAR) (Lyócsa & Stašek, 2021) has been proposed to directly use observable and measurable volatility to predict market volatility. The GARCH-type methods use historical volatility data to predict future volatility under the assumption of conditional heteroskedasticity.