Improving stock market volatility forecasts with complete subset linear and quantile HAR models

doi:10.1016/j.eswa.2021.115416

Expert Systems with Applications

Volume 183, 30 November 2021, 115416

https://doi.org/10.1016/j.eswa.2021.115416 Get rights and content

Highlights

•
We design complete subset linear (CSLR) and quantile regression (CSQR) HAR models.
•
Our approach is on the border of machine learning and standard econometric literature.
•
Our sample covers four broad market indices: S&P 500, NIKKEI 225, STOXX 50, SSEC.
•
CSLR and CSQR tend to outperform benchmark models: HAR-RV, HAR-SJ, HAR-SV, HAR-CJ.

Abstract

Volatility forecasting plays an integral role in risk management, investments and security valuation for all assets with uncertain future payoffs. We enrich the literature by presenting computationally intensive variations of the heterogeneous autoregressive (HAR) volatility model: the complete subset linear/quantile regression HAR models, HAR-CSLR and HAR-CSQR. Predictions of 1- to 22-day-ahead volatility of four major market indices (NIKKEI 225, S&P 500, SSEC and STOXX 50) show that both models tend to outperform several benchmark HAR models. Forecasting accuracy improvements tend to stabilize for longer forecasting horizons: e.g., five-day-ahead improvements range from 6.57% (SSEC) to 35.62% (NIKKEI 225) and from 3.99% (STOXX) to 9.54% for mean square error (MSE) and QLIKE loss functions. In terms of MSE, the HAR-CSQR model outperforms several standard benchmark HAR models across all market indices and forecast horizons.

Introduction

Fluctuations of asset prices are essential for pricing traded assets. It follows that volatility forecasting models play a key role in risk management, investments and security valuation for all assets with uncertain future payoffs. The volatility of asset returns tends to be highly persistent. From early on, this has led researchers to use autoregressive volatility models (e.g., the autoregressive fractionally integrated moving average (ARFIMA) model of Granger & Joyeux, 1980) and the popular generalized autoregressive conditional heteroskedasticity (GARCH) class of latent volatility models proposed by Bollerslev (1986). Many variations of these models have been proposed.² However, as high-frequency data have become more accessible, interest has switched to volatility models that use directly observable and measurable volatility, that is, realized volatility, defined as the sum of squared intraday returns (e.g., Andersen et al., 2001a, Andersen et al., 2001b, Barndorff-Nielsen and Shephard, 2002). Among these models, the heterogeneous autoregressive model (HAR) of Corsi (2009) has become the new standard ’to beat’. Compared to GARCH models, the HAR model is simple to estimate, as the realized volatility is explained by past daily, weekly and monthly historical volatility components within a linear regression framework. Still, the model accurately captures the long-memory property of volatility. HAR models are easy to estimate and interpret and to adjust by adding new explanatory variables. For example, Patton and Sheppard (2015) included realized semivariances, Andersen et al. (2012) disentangled realized volatility into its jump and continuous components, Bollerslev et al. (2016) exploited the measurement error of the volatility, and Corsi and Reno, 2009, Horpestad et al., 2019 included asymmetric returns.

Another strand of the literature reports forecasting improvements via machine learning techniques, such as gradient descent boosting, random forest, support vector (quantile) machine, artificial neural network, and deep learning, to predict market volatility, e.g., Baruník and Křehlík, 2016, Liu, 2019, Ramos-Pérez et al., 2019 and Xu et al. (2019). Our approach is on the border of the two strands of the literature, as we use a data-driven approach that is easily tractable and interpretable if necessary.³ Specifically, the complete subset approach is a combination of feature engineering and ensemble methods used in machine learning, while the HAR model is a standard econometric approach to predict market volatility.⁴

In this paper, we propose a volatility forecasting model, that makes use of realized volatility quantile forecasts to determine the expected volatility. Specifically, we first predict several quantiles of the volatility density; then, we aggregate quantiles of volatility into the expected (point estimate) volatility forecast. Although in our research, volatility density is not the goal but rather a tool towards achieving point forecasts of volatility, our research is related to the scant literature on volatility density forecasting. Berkowitz (2001) argued that volatility density forecasts are important requirements for stress testing of banks, calculating margin requirements and pricing financial derivatives. Volatility density forecasts are usually based on parametric models (e.g., Corsi et al., 2008). Our nonparametric approach to density forecasts is motivated by the work of Gaglianone and Lima (2012), who used quantile regression to predict the distribution of U.S. unemployment and survey forecasts. Quantile regression is appealing, as it does not require the assumption of a parametric form of the conditional distribution of the variable of interest. Unsurprisingly, several others have followed this line of thought. For example, Manzan and Zerom (2013) predicted the distribution of U.S. inflation, and Pedersen (2015) predicted the distribution of equity and bond market returns. Moreover, Meligkotsidou et al. (2019a) have used nonparametric density forecasts to predict the equity market premium.⁵

As tail events are, by definition, rare, predicting quantiles, specifically, the tails of a target variable’s distribution, might lead to large forecast errors, which may deteriorate the accuracy of the predictions created from quantile forecasts. To reduce forecast errors, Meligkotsidou et al. (2019a) adapted the idea of complete subset linear regression (CSLR) of Elliott et al. (2013). CSLR forecasts the target variable by aggregating forecasts using all model specifications that are possible given a set of $K$ explanatory variables and a number of admissible independent variables $k \leq K$ . For example, given a linear regression framework and $K = 4$ potential explanatory variables, one could create $4$ forecasts using models with one independent variable $k = 1$ , $6$ forecasts with $k = 2$ , $4$ with $k = 3$ and $1$ with $k = 4$ , i.e. $K! ∕ ((K - k)! k!)$ . These forecasts can then be combined via a suitable function to obtain the point forecast of interest. Meligkotsidou et al. (2019a) adapted this approach for quantile regression, leading to complete subset quantile regression (CSQR), which we also exploit in this study. Finally, Meligkotsidou et al. (2019b) predicted the monthly level of the U.S. S&P 500 market index volatility using the CSQR approach by expanding a first-order autoregressive model with macroeconomic variables.

We contribute to the volatility literature and extend the previous studies by combining standard HAR volatility models and the CSLR (linear) and CSQR (quantile) approaches into the HAR-CSLR and HAR-CSQR volatility models. The accuracy of HAR-CSLR and HAR-CSQR is empirically tested on a sample of market indices of four large markets, the U.S. S&P 500, Japan’s NIKKEI 225, China’s SSEC Composite and Europe’s STOXX 50. We find that HAR-CSLR and HAR-CSQR models tend to outperform popular benchmark HAR models. Meligkotsidou et al. (2019b) is most closely related to this research; however, we differ in several aspects.

First, as a baseline model, we rely on several popular HAR models instead of the simple autoregressive (AR) model specification (as in Meligkotsidou et al., 2019b). We therefore contribute to the extensive literature on various HAR model types (e.g., Degiannakis et al., 2020, Patton and Sheppard, 2015). Among four standard HAR models, we cannot find (given our sample) one that performs best for all market indices, loss functions and forecast horizons, yet the approach of Patton and Sheppard (2015) tends to consistently perform well.

Second, Meligkotsidou et al. (2019b) studied monthly levels of a volatility.⁶ However, most of the existing volatility studies are concerned with day-ahead volatility forecasts, as shorter forecast horizons are more relevant for managing positions of risky assets. We therefore study the accuracy of the HAR-CSLR and HAR-CSQR models for 1,2, …22-day-ahead forecasts. This way we provide evidence on the usefulness of HAR-CSLR and HAR-CSQR forecasts for daily forecast periods as well as for periods leading to a monthly level of volatility that corresponds to approximately $22$ trading days. We are therefore able to evaluate whether HAR-CSLR and HAR-CSQR have greater merit for shorter or longer forecasting horizons. We find, that when HAR-CSLR and HAR-CSQR models are evaluated via the mean square error loss function, they tend to outperform benchmark HAR models for longer forecast horizons. On the other hand, the asymmetric loss function tends to suggest that HAR-CSLR and HAR-CSQR work best for forecast horizons of up to nine days.

Third, Meligkotsidou et al., 2019a, Meligkotsidou et al., 2019b relied on macroeconomic data. While macroeconomic variables might be useful even when modeling short-term volatility (e.g., Lyócsa et al. 2020b), in such settings, it is difficult to evaluate the role CSLR and CSQR play in volatility forecasting, as part of the increased accuracy might be due to the use of macroeconomic variables and not the modeling approach per se. Our approach is different in that we use only data that can be retrieved from the price series, and we also model realized variance directly.⁷

Fourth, previous studies have employed three- or five-quantile aggregation, which has left an open question of whether aggregating across more quantiles is beneficial in practice. This factor has important practical implications, as the need to predict multiple quantiles in a CSQR framework increases the computation time. We therefore present a seven-quantile method and empirically compare three aggregation techniques (three-, five- and seven-quantile methods). As our results suggest that the five- and seven-quantile methods perform similarly, we recommend the use of the more parsimonious five-quantile method.

The remainder of this paper is organized as follows. In Section 2, we describe our sample and realized measures. In Section 3, we outline the benchmark models and the HAR-CSLR and HAR-CSQR models, along with the forecasting procedure and forecast evaluation framework. Section 4 reports our results, and Section 5 concludes and highlights further lines of research.

Section snippets

Data sources

To demonstrate the HAR-CSLR and HAR-CSQR models, we use data on four market indices corresponding to the largest markets, namely, the S&P 500 (U.S.), STOXX 50 (Europe), NIKKEI 225 (Japan), and SSEC Composite (China). Our sample starts in January $2003$ and ends in March $2020$ . The four indices track the development of the largest stock markets in the world, which given the most recent data, correspond to approximately two thirds of the total market capitalization of the world.⁸

Standard predictive HAR models

The standard HAR-RV model of Corsi (2009) predicts volatility using a set of three volatility components: the average level of volatility over the past one ( $R V_{t}^{D}$ , daily), five ( $R V_{t}^{W}$ , weekly) or twenty-two ( $R V_{t}^{M}$ , monthly) trading days: $R V_{t, H} = β_{0} + β_{1} R V_{t}^{D} + β_{2} R V_{t}^{W} + β_{3} R V_{t}^{M} + u_{t, H}$ We next use the HAR-CJ model specification (similar to that of Andersen et al., 2007, Degiannakis et al., 2020, Sévi, 2014), which considers the continuous and jump ( $J C_{t}$ ) volatility components. Specifically, $J C_{t}$ , which is the

Baseline results

The summary of the realized measures reported in Table 1 and the visualization of the series in Fig. 2, Fig. 3 show well-known stylized facts of the volatility series. Realized volatility is skewed to the right and highly persistent. Even at the 22nd lag, the autocorrelation coefficient is 0.28 for the S&P $500$ and 0.09 for the NIKKEI $225$ . Moreover, the continuous component ( $M V_{t}$ ) shows greater persistence, and the signed jumps ( $S J_{t}$ ) show almost no persistence. Also notable is the high

Conclusion

We extend the heterogeneous autoregressive (HAR) model of Corsi (2009) and its recent extensions (e.g., Andersen et al. 2012; Patton and Sheppard 2015) via the complete subset regression of Elliott et al. (2013) (HAR-CSLR model) and the complete subset quantile regression of (Meligkotsidou et al., 2019a, Meligkotsidou et al., 2019b) (HAR-CSQR model). The HAR-CSLR and HAR-CSQR models are empirically tested to predict the 1- to 22-day-ahead realized variance of four major market indices, the

CRediT authorship contribution statement

Štefan Lyócsa: Software, Conceptualization, Methodology, Writing - original draft, Software. Daniel Stašek: Data curation, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (54)

AndersenTorben G et al.
The distribution of realized stock return volatility
Journal of financial economics
(2001)
AndersenTorben G. et al.
Jump-robust volatility estimation using nearest neighbor truncation
Journal of Econometrics
(2012)
BaruníkJozef et al.
Combining high frequency data with non-linear models for forecasting energy market volatility
Expert Systems with Applications
(2016)
BollerslevTim
Generalized autoregressive conditional heteroskedasticity
Journal of Econometrics
(1986)
BollerslevTim et al.
Exploiting the errors: A simple approach for improved volatility forecasting
Journal of Econometrics
(2016)
ElliottGraham et al.
Complete subset regressions
Journal of Econometrics
(2013)
HorpestadJone B et al.
Asymmetric volatility in equity markets around the world
The North American Journal of Economics and Finance
(2019)
JoseVictor Richmond R. et al.
Simple robust averages of forecasts: Some empirical results
International Journal of Forecasting
(2008)
LiuYang
Novel volatility forecasting using deep learning–long short term memory recurrent neural networks
Expert Systems with Applications
(2019)
LiuJing et al.
Forecasting the chinese stock volatility across global stock markets
Physica A. Statistical Mechanics and its Applications
(2019)

LyócsaŠtefan et al.

Fear of the coronavirus and the stock markets

Finance Research Letters

(2020)

LyócsaŠtefan et al.

Impact of macroeconomic news, regulation and hacking exchange markets on the volatility of bitcoin

Journal of Economic Dynamics and Control

(2020)

LyócsaŠtefan et al.

Volatility forecasting of non-ferrous metal futures: Covariances, covariates or combinations?

Journal of International Financial Markets, Institutions and Money

(2017)

LyócsaŠtefan et al.

Predicting risk in energy markets: Low-frequency data still matter

Applied Energy

(2021)

MaFeng et al.

Are low-frequency data really uninformative? A forecasting combination perspective

The North American Journal of Economics and Finance

(2018)

ManzanSebastiano et al.

Are macroeconomic variables useful for forecasting the distribution of US inflation?

International Journal of Forecasting

(2013)

MolnárPeter

Properties of range-based volatility estimators

International Review of Financial Analysis

(2012)

PattonAndrew J.

Volatility forecast comparison using imperfect volatility proxies

Journal of Econometrics

(2011)

PattonAndrew J. et al.

Optimal combinations of realised volatility estimators

International Journal of Forecasting

(2009)

Ramos-PérezEduardo et al.

Forecasting volatility with a stacked model based on a hybridized artificial neural network

Expert Systems with Applications

(2019)

SéviBenoît

Forecasting the volatility of crude oil futures using intraday data

European Journal of Operational Research

(2014)

TaylorNick

Realised variance forecasting under Box–Cox transformations

International Journal of Forecasting

(2017)

XuQifa et al.

A novel UMIDAS–SVQR model with mixed frequency investor sentiment for predicting stock market volatility

Expert Systems with Applications

(2019)

AndersenTorben G. et al.

Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility

The Review of Economics and Statistics

(2007)

AndersenTorben G et al.

The distribution of realized exchange rate volatility

Journal of the American Statistical Association

(2001)

Barndorff-NielsenOle E et al.

Limit theorems for bipower variation in financial econometrics

Econometric Theory

(2006)

Barndorff-NielsenOle E et al.

Designing realized kernels to measure the ex post variation of equity prices in the presence of noise

Econometrica

(2008)

Cited by (14)

Volatility forecasting on China's oil futures: New evidence from interpretable ensemble boosting trees
2024, International Review of Economics and Finance
This paper investigates the performance of ensemble boosting trees in forecasting volatility of China's crude oil futures by combining rich feature variables and multiple volatility forecasting models. The empirical results demonstrate that ensemble boosting tree models significantly outperform the HAR-RV model and traditional machine learning models, with the CatBoost and the LightGBM having the best forecasting performance, and that these conclusions hold up under robustness tests. Using the SHAP values model interpretability instrument, this paper analyzes the model interpretability of LightGBM and CatBoost in terms of the drivers of volatility forecasting, the contribution of variables in a specific period, and the performance of variables in forecasting outliers. It is discovered that macroeconomic variables and HAR-type variables have different forecasting contributions in CatBoost and LightGBM, and that the contribution of different variables to the forecasting window varies significantly within a single interval. In addition, the paper concludes that there is heterogeneity in the forecast contribution of the same predictor across models, so the selection of variables for forecasting volatility should be based on the actual situation. Lastly, additional analysis confirms that the ensemble boosting tree models also have a high economic value.
Forecasting of clean energy market volatility: The role of oil and the technology sector
2024, Energy Economics
This study is the first to explore whether the well-known relationship between the clean energy sector, oil prices, and technology stocks can be leveraged to enhance the accuracy of realized volatility forecasts for individual clean energy sub-sectors. Based on intraday data and various decompositions of daily realized volatility, we account for the heterogeneity across clean energy sub-sectors using the dynamic common correlated effect heterogeneous autoregressive (DCCE-HAR) model. Our findings reveal that, in the short term, price variations in technology shares are more informative for future clean energy volatility than fluctuations in oil prices. In an out-of-sample analysis, we individually forecast the volatility of each clean energy sub-index using Lasso, Ridge, and random forest approaches. We identify sub-indices that systematically benefit from technology sector price variation (e.g. Smart Grid, Operators, Energy Management), sub-indices that benefit from oil price variation (e.g. Bio Fuel, Wind and Geothermal), while also sub-indices that show limited sensitivity to price variation in the technology and oil markets.
Forecasting day-ahead expected shortfall on the EUR/USD exchange rate: The (I)relevance of implied volatility
2024, International Journal of Forecasting
The existing literature provides mixed results on the usefulness of implied volatility for managing risky assets, while evidence for expected shortfall predictions is almost nonexistent. Given its forward-looking nature, implied volatility might be more valuable than backward-looking measures of realized price fluctuations. Conversely, the volatility risk premium embedded in implied volatility leads to overestimating the observed price variation. This paper explores the benefits of augmenting econometric models used in forecasting the expected shortfall, a risk measured endorsed in the Basel III Accord, with information on implied volatility obtained from EUR/USD option contracts. The day-ahead forecasts are obtained from several classes of econometric models: historical simulation, EGARCH, quantile regression-based HAR, joint VaR and ES model, and combination forecasts. We verify whether the resulting expected shortfall forecasts are well-specified and test the models’ accuracy. Our results provide evidence that the information provided by forward-looking implied volatility is more valuable than that in backward-looking realized measures. These results hold across multiple model specifications, are stable over time, hold under alternative loss functions, and are more pronounced during periods of higher market uncertainty when risk modeling matters most.
Complete subset averaging methods in corporate bond return prediction
2023, Finance Research Letters
We investigate the performances of two methods of complete subset averaging—complete subset linear averaging (CSLA) and complete subset quantile averaging (CSQA)—on the problem of corporate bond return prediction. We find that the two methods are overwhelmingly better than univariate linear regression and simple forecast combination. Meanwhile, CSQA is better than CSLA in most cases. For practical implementation, we also provide discussions on the selection of the hyperparameter $k$ when applying these complete subset averaging methods.
A high-frequency approach to VaR measures and forecasts based on the HAR-QREG model with jumps
2022, Physica A: Statistical Mechanics and its Applications
The occurrence of extreme events has brought tremendous impact to stock markets, and the accuracy of measuring and forecasting value at risk (VaR) has important theoretical and practical value for the risk management of stock markets. This paper proposes a heterogeneous auto-regression quantile regression (HAR-QREG) model based on 5-min high frequency data and incorporating positive and negative jumps to explore the heterogeneity of different volatility components on returns under different market states and to make sliding forecasting of VaR. The results show that: (1) in the conditional quantile tail of returns, short-term daily and medium-term weekly volatility have a greater impact on returns than long-term monthly volatility in the Chinese stock market. (2) Volatility of different maturities has a significantly greater impact on returns in bear markets than in oscillating and bull markets. (3) There is heterogeneity in the impact of jump volatility on returns across different market states, with a greater impact in bear and bull markets, but the degree of impact decreases as the duration lengthens. Furthermore, the model has better results for out-of-sample VaR forecasting.
Forecasting stock volatility and value-at-risk based on temporal convolutional networks
2022, Expert Systems with Applications
Citation Excerpt :
Broadly speaking, these techniques can be categorized into three classes, that is, conventional GARCH-type models (Bauwens et al., 2006; Bollerslev, 1986; Bollerslev et al., 1992; Engle, 1982), stochastic volatility models (Jacquier et al., 2004; Kastner et al., 2017; Taylor, 1994) and the methods based on machine learning (Gamboa, 2017; Hou, 2013; Liu, 2019; Yu & Li, 2018). As high-frequency data become more accessible, the heterogeneous autoregressive model (HAR) (Lyócsa & Stašek, 2021) has been proposed to directly use observable and measurable volatility to predict market volatility. The GARCH-type methods use historical volatility data to predict future volatility under the assumption of conditional heteroskedasticity.
In recent years, deep learning has attracted increasing popularity in modern financial fields. The volatility of financial asset returns as well as the Value-at-Risk (VaR) play a significant role in many applications such as risk management, investment portfolios and etc. Thus, it is extremely essential to accurately estimate volatility and VaR. Temporal convolutional networks (TCNs), a relatively new deep learning architecture for solving sequential modeling tasks, have demonstrated convincingly good performance in many applications. In this paper, we utilize TCNs to forecast stock volatility and VaR. To the best of our knowledge, this is the first attempt to address this task with TCNs. In the experiments conducted with both synthetic data and some real stock data, TCNs are compared with other twelve popular models which include nine conventional approaches (i.e., three GARCH-type models with each being considered three tail distributions) and three deep learning methods (i.e., LSTM, LSTM with attention mechanism and GRU). The Friedman test followed by the Nemenyi post-hoc test is also employed to analyze whether TCNs perform significantly better than the other methods across the real stock datasets. As for volatility modeling, experimental results show that TCNs outperforms all the other methods in terms of RMSE (root mean squared error) and MAE (mean absolute error). In the meantime, TCNs behave best in calculating VaR when evaluating their performance with several metrics. More importantly, the superiority of TCNs over GARCH-type methods are statistically significant. As a result, TCNs can be regarded as an important technique to forecast return volatility and the associated VaR.

View all citing articles on Scopus

¹: Lyócsa appreciates the support from VEGA project, Slovakia ”Volatility density forecasts on financial markets” under Grant no. 1/0257/18.

View full text

Improving stock market volatility forecasts with complete subset linear and quantile HAR models

Highlights

Abstract

Introduction

Section snippets

Data sources

Standard predictive HAR models

Baseline results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Journal of financial economics

Journal of Econometrics

Expert Systems with Applications

Journal of Econometrics

Journal of Econometrics

Journal of Econometrics

The North American Journal of Economics and Finance

International Journal of Forecasting

Expert Systems with Applications

Physica A. Statistical Mechanics and its Applications

Finance Research Letters

Journal of Economic Dynamics and Control

Journal of International Financial Markets, Institutions and Money

Applied Energy

The North American Journal of Economics and Finance

International Journal of Forecasting

International Review of Financial Analysis

Journal of Econometrics

International Journal of Forecasting

Expert Systems with Applications

European Journal of Operational Research

International Journal of Forecasting

Expert Systems with Applications

Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility

The Review of Economics and Statistics

The distribution of realized exchange rate volatility

Journal of the American Statistical Association

Limit theorems for bipower variation in financial econometrics

Econometric Theory

Designing realized kernels to measure the ex post variation of equity prices in the presence of noise

Econometrica