Abstract
The meta-elliptical t copula with noncentral t GARCH univariate margins is studied as a model for asset allocation. A method of parameter estimation is deployed that is nearly instantaneous for large dimensions. The expected shortfall of the portfolio distribution is obtained by combining simulation with a parametric approximation for speed enhancement. A simulation-based method for mean-expected shortfall portfolio optimization is developed. An extensive out-of-sample backtest exercise is conducted and comparisons made with common asset allocation techniques.
M.S. Paolella—Financial support by the Swiss National Science Foundation (SNSF) through project #150277 is gratefully acknowledged.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aas, K.: Pair-copula constructions for financial applications: a review. Econometrics 4(4), 1–15 (2016). Article 43
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-Copula Constructions of Multiple Dependence. Insur. Math. Econ. 44, 182–198 (2009)
Abdous, B., Genest, C., Rémillard, B.: Dependence Properties of Meta-Elliptical Distributions. In: Duchesne, P., Rémillard, B. (eds.) Statistical Modeling and Analysis for Complex Data Problems. Springer Verlag, New York (2005). Chapter 1
Adcock, C.J.: Asset pricing and portfolio selection based on the multivariate extended skew-student-\(t\) distribution. Ann. Oper. Res. 176(1), 221–234 (2010)
Adcock, C.J.: Mean-variance-skewness efficient surfaces, Stein’s lemma and the multivariate extended skew-student distribution. Eur. J. Oper. Res. 234(2), 392–401 (2014)
Adcock, C.J., Eling, M., Loperfido, N.: Skewed distributions in finance and actuarial science: a preview. Eur. J. Financ. 21(13–14), 1253–1281 (2015)
Aielli, G.P.: Dynamic conditional correlation: on properties and estimation. J. Bus. Econ. Stat. 31(3), 282–299 (2013)
Aielli, G.P., Caporin, M.: Fast clustering of GARCH processes via gaussian mixture models. Math. Comput. Simul. 94, 205–222 (2013)
Asai, M.: Heterogeneous asymmetric dynamic conditional correlation model with stock return and range. J. Forecast. 32(5), 469–480 (2013)
Ausin, M.C., Lopes, H.F.: Time-varying joint distribution through copulas. Comput. Stat. Data Anal. 54, 2383–2399 (2010)
Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J.: Pseudo-mathematics and financial charlatanism: the effects of backtest overfitting on out-of-sample performance. Not. Am. Math. Soc. 61(5), 458–471 (2014)
Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J.: The probability of backtest overfitting. J. Comput. Finan. (2016). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2840838
Bali, T.G., Engle, R.F.: The intertemporal capital asset pricing model with dynamic conditional correlations. J. Monetary Econ. 57(4), 377–390 (2010)
Fundamental Review of the Trading Book: A Revised Market Risk Framework. Consultative document, Bank for International Settlements, Basel (2013)
Bauwens, L., Rombouts, J.V.K.: Bayesian clustering of many GARCH models. Econometric Rev. 26(2), 365–386 (2007)
Billio, M., Caporin, M.: A generalized dynamic conditional correlation model for portfolio risk evaluation. Math. Comput. Simul. 79(8), 2566–2578 (2009)
Billio, M., Caporin, M., Gobbo, M.: Flexible dynamic conditional correlation multivariate GARCH models for asset allocation. Appl. Financ. Econ. Lett. 2(2), 123–130 (2006)
Bloomfield, T., Leftwich, R., Long, J.: Portfolio strategies and performance. J. Financ. Econ. 5, 201–218 (1977)
Bollerslev, T.: A conditional heteroskedastic time series model for speculative prices and rates of return. Rev. Econ. Stat. 69, 542–547 (1987)
Bollerslev, T.: Modeling the coherence in short-run nominal exchange rates: a multivariate Generalized ARCH approach. Rev. Econ. Stat. 72, 498–505 (1990)
Broda, S.A., Haas, M., Krause, J., Paolella, M.S., Steude, S.C.: Stable mixture GARCH models. J. Econometrics 172(2), 292–306 (2013)
Broda, S. A., Paolella, M. S:. Expected Shortfall for Distributions in Finance. In: Čížek, P., Härdle, W., and Rafał W. (eds.) Statistical Tools for Finance and Insurance (2011)
Brooks, C., Burke, S.P., Persand, G.: Benchmarks and the accuracy of GARCH model estimation. Int. J. Forecast. 17(1), 45–56 (2001)
Brown, S. J., Hwang, I., In, F.: Why Optimal Diversification Cannot Outperform Naive Diversification: Evidence from Tail Risk Exposure (2013)
Bücher, A., Jäschke, S., Wied, D.: Nonparametric tests for constant tail dependence with an application to energy and finance. J. Econometrics 1(187), 154–168 (2015)
Cambanis, S., Huang, S., Simons, G.: On the theory of elliptically contoured distributions. J. Multivar. Anal. 11(3), 368–385 (1981)
Caporin, M., McAleer, M.: Ten things you should know about the dynamic conditional correlation representation. Econometrics 1(1), 115–126 (2013)
Cappiello, L., Engle, R.F., Sheppard, K.: Asymmetric dynamics in the correlations of global equity and bond returns. J. Financ. Econometrics 4(4), 537–572 (2006)
Chicheportiche, R., Bouchaud, J.-P.: The joint distribution of stock returns is not elliptical. Int. J. Theor. Appl. Financ. 15(3), 1250019 (2012)
Christoffersen, P., Errunza, V., Jacobs, K., Langlois, H.: Is the potential for international diversification disappearing? a dynamic copula approach. Rev. Financ. Stud. 25, 3711–3751 (2012)
Clare, A., O’Sullivan, N., and Sherman, M.: Benchmarking UK mutual fund performance: the random portfolio experiment. Int. J. Financ. (2015). https://www.ucc.ie/en/media/research/centreforinvestmentresearch/RandomPortfolios.pdf
Demarta, S., McNeil, A.J.: The \(t\) copula and related copulas. Int. Stat. Rev. 73(1), 111–129 (2005)
DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is the \(1/N\) portfolio strategy? Rev. Financ. Stud. 22(5), 1915–1953 (2009)
DeMiguel, V., Martin-Utrera, A., Nogales, F.J.: Size matters: optimal calibration of shrinkage estimators for portfolio selection. J. Bank. Financ. 37(8), 3018–3034 (2013)
Devroye, L.: Non-Uniform Random Variate Generation. Springer Verlag, New York (1986)
Ding, P.: On the conditional distribution of the multivariate \(t\) distribution. Am. Stat. 70(3), 293–295 (2016)
Ding, Z., Granger, C.W.J., Engle, R.F.: A long memory property of stock market returns and a new model. J. Empir. Financ. 1(1), 83–106 (1993)
Edwards, T., Lazzara, C.J.: Equal-Weight Benchmarking: Raising the Monkey Bars. Technical report, McGraw Hill Financial (2014)
Embrechts, P.: Copulas: a personal view. J. Risk Insur. 76, 639–650 (2009)
Embrechts, P., McNeil, A., Straumann, D.: Correlation and dependency in risk management: properties and pitfalls. In: Dempster, M.A.H. (ed.) Risk Management: Value at Risk and Beyond, pp. 176–223. Cambridge University Press, Cambridge (2002)
Engle, R.: Anticipating Correlations: A New Paradigm for Risk Management. Princeton University Press, Princeton (2009)
Engle, R., Kelly, B.: Dynamic equicorrelation. J. Bus. Econ. Stat. 30(2), 212–228 (2012)
Engle, R.F.: Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J. Bus. Econ. Stat. 20, 339–350 (2002)
Engle, R.F., Sheppard, K.: Theoretical and Empirical Properties of Dynamic Conditional Correlation Multivariate GARCH. NBER Working Papers 8554, National Bureau of Economic Research Inc (2001)
Fang, H.B., Fang, K.T., Kotz, S.: The meta-elliptical distribution with given marginals. J. Multivar. Anal. 82, 1–16 (2002)
Fang, K.-T., Kotz, S., Ng, K.-W.: Symmetric Multivariate and Related Distributions. Chapman & Hall, London (1989)
Fink, H., Klimova, Y., Czado, C., Stöber, J.: Regime switching vine copula models for global equity and volatility indices. Econometrics 5(1), 1–38 (2017). Article 3
Francq, C., Zakoïan, J.-M.: Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli 10(4), 605–637 (2004)
Francq, C., Zakoïan, J.-M.: GARCH Models: Structure Statistical Inference and Financial Applications. John Wiley & Sons Ltd., Chichester (2010)
Gambacciani, M., Paolella, M.S.: Robust normal mixtures for financial portfolio allocation. Forthcoming. In: Econometrics and Statistics (2017)
Haas, M., Krause, J., Paolella, M.S., Steude, S.C.: Time-varying mixture GARCH models and asymmetric volatility. North Am. J. Econ. Financ. 26, 602–623 (2013)
Haas, M., Mittnik, S., Paolella, M.S.: Mixed normal conditional heteroskedasticity. J. Financ. Econometrics 2(2), 211–250 (2004)
He, C., Teräsvirta, T.: Properties of moments of a family of GARCH processes. J. Econometrics 92(1), 173–192 (1999a)
He, C., Teräsvirta, T.: Statistical properties of the asymmetric power ARCH model. In: Engle, R.F., White, H. (eds) Cointegration, Causality, and Forecasting. Festschrift in Honour of Clive W. J. Granger, pp. 462–474. Oxford University Press (1999b)
Heyde, C.C., Kou, S.G.: On the controversy over tailweight of distributions. Oper. Res. Lett. 32, 399–408 (2004)
Hough, J.: Monkeys are better stockpickers than you’d think. Barron’s magazine (2014)
Hurst, S.: The characteristic function of the student \(t\) distribution. Financial Mathematics Research Report FMRR006-95, Australian National University, Canberra (1995). http://wwwmaths.anu.edu.au/research.reports/srr/95/044/
Jagannathan, R., Ma, T.: Risk reduction in large portfolios: why imposing the wrong constraints helps. J. Financ. 58(4), 1651–1683 (2003)
Jondeau, E.: Asymmetry in tail dependence of equity portfolios. Computat. Stat. Data Anal. 100, 351–368 (2016)
Jondeau, E., Rockinger, M.: Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. J. Econ. Dyn. Control 27, 1699–1737 (2003)
Jondeau, E., Rockinger, M.: The Copula-GARCH model of conditional dependencies: an international stock market application. J. Int. Money Financ. 25, 827–853 (2006)
Jondeau, E., Rockinger, M.: On the importance of time variability in higher moments for asset allocation. J. Financ. Econometrics 10(1), 84–123 (2012)
Karanasos, M., Kim, J.: A re-examination of the asymmetric power ARCH model. J. Empir. Financ. 13, 113–128 (2006)
Kelker, D.: Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhyā, Series A 32(4), 419–430 (1970)
Kiefer, J., Wolfowitz, J.: Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27(4), 887–906 (1956)
Kogon, S.M., Williams, D.B.: Characteristic function based estimation of stable parameters. In: Adler, R.J., Feldman, R.E., Taqqu, M.S. (eds) A Practical Guide to Heavy Tails, pp. 311–335. Birkhauser Boston Inc. (1998)
Krause, J., Paolella, M.S.: A fast, accurate method for value at risk and expected shortfall. Econometrics 2, 98–122 (2014)
Kuester, K., Mittnik, S., Paolella, M.S.: Value-at-risk prediction: a comparison of alternative strategies. J. Financ. Econometrics 4, 53–89 (2006)
Ling, S., McAleer, M.: Necessary and sufficient moment conditions for the garch(\(r, s\)) and asymmetric power garch(\(r, s\)) models. Econometric Theor. 18(3), 722–729 (2002)
Ma, J., Nelson, C.R., Startz, R.: Spurious inference in the GARCH(1,1) model when it is weakly identified. Stud. Nonlinear Dyn. Econometrics 11(1), 1–27 (2006). Article 1
Markowitz, H.: Portfolio Selection. J. Financ. 7(1), 77–91 (1952)
McAleer, M., Chan, F., Hoti, S., Lieberman, O.: Generalized autoregressive conditional correlation. Econometric Theor. 24(6), 1554–1583 (2008)
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton (2005)
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton (2015). Revised edition
Mittnik, S., Paolella, M.S.: Prediction of financial downside risk with heavy tailed conditional distributions. In: Rachev, S.T. (ed.) Handbook of Heavy Tailed Distributions in Finance. Elsevier Science, Amsterdam (2003)
Mittnik, S., Paolella, M.S., Rachev, S.T.: Stationarity of stable power-GARCH processes. J. Econometrics 106, 97–107 (2002)
Nguyen, H.T.: On evidential measures of support for reasoning with integrate uncertainty: a lesson from the ban of P-values in statistical inference. In: Huynh, V.-N., Inuiguchi, M., Le, B., Le, B.N., Denoeux, T. (eds.) 5th International Symposium on Integrated Uncertainty in Knowledge Modeling and Decision Making IUKM 2016, pp. 3–15. Springer, Cham (2016)
Nolan, J. P.: Stable Distributions - Models for Heavy Tailed Data. Birkhäuser, Boston (2015, forthcoming). Chapter 1 online
Paolella, M.S.: Intermediate Probability: A Computational Approach. John Wiley & Sons, Chichester, West Sussex, England (2007)
Paolella, M.S.: Multivariate asset return prediction with mixture models. Eur. J. Financ. 21, 1–39 (2013)
Paolella, M.S.: Fast methods for large-scale non-elliptical portfolio optimization. Ann. Financ. Econ. 09(02), 1440001 (2014)
Paolella, M.S.: Stable-GARCH models for financial returns: fast estimation and tests for stability. Econometrics 4(2), 25 (2016). Article 25
Paolella, M.S.: The univariate collapsing method for portfolio optimization. Econometrics 5(2), 1–33 (2017). Article 18
Paolella, M.S., Polak, P.: ALRIGHT: Asymmetric LaRge-Scale (I)GARCH with hetero-tails. Int. Rev. Econ. Financ. 40, 282–297 (2015a)
Paolella, M.S., Polak, P.: COMFORT: A common market factor non-gaussian returns model. J. Econometrics 187(2), 593–605 (2015b)
Paolella, M.S., Polak, P.: Portfolio Selection with Active Risk Monitoring. Research paper, Swiss Finance Institute (2015c)
Paolella, M.S., Polak, P.: Density and Risk Prediction with Non-Gaussian COMFORT Models (2017). Submitted
Paolella, M.S., Polak, P., Walker, P.: A Flexible Regime-Switching Model for Asset Returns (2017). Submitted
Patton, A.J.: A review of copula models for economic time series. J. Multivar. Anal. 110, 4–18 (2012)
Pelletier, D.: Regime switching for dynamic correlations. J. Econometrics 131, 445–473 (2006)
Righi, M.B., Ceretta, P.S.: Individual and flexible expected shortfall backtesting. J. Risk Model Valid. 7(3), 3–20 (2013)
Righi, M.B., Ceretta, P.S.: A comparison of expected shortfall estimation models. J. Econ. Bus. 78, 14–47 (2015)
Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, London (1994)
Scherer, M.: CDO pricing with nested archimedean copulas. Quant. Financ. 11, 775–787 (2011)
Shaw, W.T.: Monte Carlo Portfolio Optimization for General Investor Risk-Return Objectives and Arbitrary Return Distributions: a Solution for Long-only Portfolios (2010)
So, M.K.P., Yip, I.W.H.: Multivariate GARCH models with correlation clustering. J. Forecast. 31(5), 443–468 (2012)
Song, D.-K., Park, H.-J., Kim, H.-M.: A note on the characteristic function of multivariate \(t\) distribution. Commun. Stat. Appl. Methods 21(1), 81–91 (2014)
Stoyanov, S., Samorodnitsky, G., Rachev, S., Ortobelli, S.: Computing the portfolio conditional value-at-risk in the alpha-stable case. Probab. Math. Statistics 26, 1–22 (2006)
Sutradhar, B.C.: On the characteristic function of multivariate student \(t\)-distribution. Can. J. Stat. 14(4), 329–337 (1986)
Tse, Y.K., Tsui, A.K.C.: A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. J. Bus. Econ. Stat. 20(3), 351–362 (2002)
Vargas, G.A.: An asymmetric block dynamic conditional correlation multivariate GARCH model. Philippine Stat. 55(1–2), 83–102 (2006)
Winker, P., Maringer, D.: The convergence of estimators based on heuristics: theory and application to a GARCH model. Comput. Stat. 24(3), 533–550 (2009)
Wolf, O.L.M.: Honey, I shrunk the sample covariance matrix: problems in mean-variance optimization. J. Portfolio Management 30(4), 110–119 (2004)
Zhou, T., Chan, L.: Clustered dynamic conditional correlation multivariate garch model. In: Song, I.-Y., Eder, J., Nguyen, T. M. (eds) Proceedings of the 10th International Conference Data Warehousing and Knowledge Discovery, DaWaK 2008, Turin, Italy, 2–5 September 2008, pp. 206–216 (2008)
Zolotarev, V.M.: One Dimensional Stable Distributions (Translations of Mathematical Monograph, Vol. 65). American Mathematical Society, Providence, RI (1986). Translated from the original Russian verion (1983)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Parametric Forms for Approximating the Distribution of \(\widetilde{\mathbf {R}}_{P}\)
We detail here the four candidate parametric structures mentioned in Sect. 2.6.
1.1 A.1 The Noncentral Student’s t
The first is the location-scale \(\mathrm {NCT}^{*}\) distribution (3). As location \(\mu \) and scale \(\sigma \) parameters need to be estimated along with the \(\mathrm {NCT}^{*}\) shape parameters, we compute
Starting values are taken to be the 50% trimmed mean for \(\mu \) (i.e., the lower and upper 25% of the sorted sample are ignored) and, using (6) with \(\nu =4\) and \(\gamma =0\), gives \((s^2/2)^{1/2}\) for \(\sigma \), where \(s^2\) denotes the sample variance. Two box constraints \(q_{0.25}< \widehat{\mu }<q_{0.75}\) and \((s^2/10)^{1/2}< \widehat{\sigma } < s\) are imposed during estimation, where \(q_{\xi }\) denotes the \(\xi \)th sample quantile. The mean and variance are then determined from (6), while the ES is, via a table-lookup procedure, given essentially instantaneously from the KP method, noting that, for any probability \(0<\xi <1\), \(\mathrm{ES}({{P}}_{t+1 \mid t, \mathbf {w}}; \xi ) = \mu + \sigma \mathrm{ES}({{Z}}_{t+1 \mid t, \mathbf {w}}; \xi )\).
1.2 A.2 The Generalized Asymmetric t
The second candidate is the five-parameter generalized asymmetric t, or GAt distribution. The pdf is
where \(d,\nu ,\theta \in {\mathbb R}_{> 0}\), and \(K^{-1}=(\theta ^{-1} + \theta ) d^{-1} \nu ^{1{/}d} B(1{/}d,\nu )\). It is noteworthy because limiting cases include the generalized exponential (GED), and hence the Laplace and normal, while the Student’s t (and, thus, the Cauchy) distributions are special cases. For \(\theta >1\) (\(\theta <1\)) the distribution is skewed to the right (left), while for \(\theta =1\), it is symmetric. See Paolella (2007, p. 273) for further details. The rth moment for integer r such that \(0 \le r < \nu d\) is
i.e., the mean is
when \(\nu d >1\), and the variance is computed in the obvious way. The cumulative distribution function (cdf) of \(Z \sim \mathrm{GA}t(d,\nu ,\theta )\) is
where is the incomplete beta ratio,
For computing the ES, we require \({\mathbb E}[Z^r\mid Z<c]\) for \(r=1\). For \(c<0\), this is given by
The existence of the mean and the ES requires \(\nu d >1\).
1.3 A.3 The Two-Component Mixture GAt
With five parameters (including location and scale), the GAt is a rather flexible distribution. However, as our third choice, greater accuracy can be obtained by using a two-component mixture of GAt, with mixing parameters \(0<\lambda _1<1\) and \(\lambda _2=1-\lambda _1\). This 11 parameter construction is extraordinarily flexible, and should be quite adequate for modeling the portfolio distribution. We also assume that the true distribution is not (single component) GAt, and that the distributional class of two-component mixtures of GAt is identified. Its pdf and cdf are just weighted sums of GAt pdfs and cdfs respectively, so that evaluation of the cdf is no more involved than that of the GAt. Let P denote a K-component mixGAt distribution, where each component has the three aforementioned shape parameters, as well as location \(u_i\) and scale \(c_i\), \(i=1,\dots , K\). First observe that the cdf of the mixture is given by
where the ith cdf mixture component is given as the closed-form expression in (45), so that a quantile can be found by simple one-dimensional root searching. Similar to calculations for the ES of mixture distributions in Broda and Paolella (2011), the ES of the mixture is given by
where \(q_{P,\xi }\) is the \(\xi \)-quantile of P, \(S_{1,Z_j}\) is given in (46), and \(F_{Z_j}\) is the cdf of the GAt random variable given in (45), both functions evaluated with the parameters \(d_j\), \(\nu _j\), and \(\theta _j\) from the mixture components, \(Z_j\), for \(j=1,\dots ,K\).
While estimation of the two-component mixture GAt is straightforward using standard ML estimation, it was found that this occasionally resulted in an inferior, possibly bi-modal fit that optically did not agree well with a kernel-density estimate. This artefact arises from the nature of mixture distributions and the problems associated with the likelihood. We present a method that leads, with far higher probability, to a successful model fit, based on a so-called augmented likelihood procedure. The technique was first presented in Broda et al. (2013) and is adapted for the mixture GAt as follows.
Let \(f(x; \varvec{\theta })=\sum _{i=1}^K \lambda _i f_i(x; \varvec{\theta }_i)\) be the univariate pdf of a K-component (finite) mixture distribution with component weights \(\lambda _1, \ldots , \lambda _K\) positive and summing to one. The likelihood function is
where \(\mathbf {x}=(x_1,\dots ,x_T)'\) is the sequence of evaluation points, and \(\varvec{\theta } = (\varvec{\lambda }, \varvec{\theta }_1, \dots , \varvec{\theta }_K)'\) is the vector of all model parameters. Assuming that the \(\varvec{\theta }_i\) include location and scale parameters, \(\ell ^{\star }\) is plagued with “spikes”—it is an unbounded function with multiple maxima, see, e.g., Kiefer and Wolfowitz (1956). Hence, numerical maximization of (49) is prone (depending on factors like starting values and the employed numerical optimization method) to result in inaccurate, if not arbitrary, estimates. To avoid this problem, an augmented likelihood function is proposed in Broda et al. (2013). The idea is to remove unbounded states from the likelihood function by introducing a smoothing (shrinkage) term that, at maximum, drives all components to act as one (irrespective of their assigned mixing weight) such that the mixture loses its otherwise inherently large flexibility. The suggested augmented likelihood function is given by
where \(\kappa \), \(\kappa \ge 0\), controls the shrinkage strength. If all component densities \(f_i\) are of the same type, larger values of \(\kappa \) lead to more similar parameter estimates across components, with identical estimates in the limit, as \(\kappa \rightarrow \infty \). At \(\kappa =0\), (50) reduces to (49). The devised estimator,
is termed the augmented likelihood estimator (ALE) and is asymptotically consistent, as \(T \rightarrow \infty \). By changing \(\kappa \), smooth density estimates can be enforced, even for small sample sizes. For mixGAt with \(K=2\) and 250 observations, we obtain \(\kappa =10\) as an adequate choice, which, in our empirical testing, guaranteed unimodal estimates in all cases, while still offering enough flexibility for accurate density fits, significantly better than those obtained with the single component GAt.
1.4 A.4 The Asymmetric Stable Paretian
The fourth candidate we consider is the use of the asymmetric non-Gaussian stable Paretian distribution, hereafter stable, with location \(\mu \), scale c, tail index \(\alpha \), and asymmetry parameter \(\beta \). We use the parametrization such that the mean, assuming \(\alpha >1\), is given by \(\mu \). (In Nolan 2015, and the use of his software, this corresponds to his first parametrization; see also Zolotarev 1986; and Samorodnitsky and Taqqu 1994.)
This might at first seem like an odd candidate, given the historical difficulties in its estimation and the potentially problematic calculation of the ES, given the extraordinary heavy-tailed nature of the distribution and the problems associated with the calculation of the density far into the tails; see, e.g., Paolella (2016) and the references therein. We circumvent both of these issues as follows. We make use of the estimator based on the sample characteristic function of Kogon and Williams (1998), which is fast to calculate, and results in estimates that are very close in performance to the MLE. We use the function provided in John Nolan’s STABLE toolbox, saving us the implementation, and easily confirming via simulation that his procedure is correct (and very fast). For the ES calculation, we first need the appropriate quantile, which is also implemented in Nolan’s toolbox. The ES integral can then be computed using the integral expression given in Stoyanov et al. (2006), which cleverly avoids integration into the tail.
This procedure, while feasible, is still too time consuming for our purposes. Instead, we use the same procedure employed in Krause and Paolella (2014) to generate a (massive) table in two dimensions (\(\alpha \) and \(\beta \)) to deliver the VaR (the required quantile) and the ES, essentially instantaneously and with very high accuracy. There is one caveat with its use that requires remedying. It is well-known, and as simulations quickly verify, that estimation of the asymmetry parameter \(\beta \) is subject to the most variation, for any particular sample size. The nature of the stable distribution, with its extremely heavy tails, relative to asymmetric Student’s t distributions, will induce observations in small samples that have a relatively large impact on the estimation of \(\beta \). This is particularly acute when using a relatively small sample size of \(T=250\). As such, we recommend use of a simple shrinkage estimator, with target zero and weight \(s_{\beta }\), namely delivering \(\widehat{\beta } = s_{\beta } \widehat{\beta }_{\mathrm{MLE}}\). Some trial and error suggests \(s_{\beta }=0.3\) to be a reasonable choice for \(T=250\).
The motivation for using the stable is the conservative nature of the delivered ES. In particular, the first three methods we discussed are all based on asymmetric variations of the Student’s t distribution which, while clearly heavy-tailed (it does not possess a moment generating function on an open neighborhood around zero), still potentially possesses a variance; as opposed to the stable, except in the measure-zero case of \(\alpha =2\). As such, and because estimation is based on a finite amount of data, the ES delivered from the stable will be expected to be larger than those from the t-based models. This might be desirable when more conservative estimates of risk should be used, and will also be expected to affect the optimized portfolio vectors and the performance of the method.
1.5 A.5 Discussion of Portfolio Tail Behavior and ES
It is worth mentioning that the actual tail behavior of financial assets is not necessarily heavy-tailed; the discussion in Heyde and Kou (2004) should settle this point. This explains why, on the one hand, exponential-tailed distributions, such as the mixed normal, can deliver excellent VaR predictions; see, e.g., Haas et al. (2004), Haas et al. (2013), and Paolella (2013); while, on the other hand, stable-Paretian GARCH models also work admirably well; see e.g., Mittnik et al. (2002) and Mittnik and Paolella (2003).
Further, observe that the tail behavior associated with the \({{P}}_{t+1 \mid t, \mathbf {w}}\), given the model and the parameters, is not subject to debate: by the nature of the model we employ, it involves convolutions of (dependent) random variables with power tails, and, as such, will also have power tails, and will (presumably) be in the domain of attraction of a stable law. It is, however, analytically intractable. Observe that it is fallacious to argue that, as our model involves use of the (noncentral) Student’s t, with estimated degrees of freedom parameters (after application of the APARCH filter) above two, the convolution will have a finite variance, and so the stable distribution cannot be considered. It is crucial to realize first that the model we employ is wrong w.p.1 (and also subject to estimation error) and, second, recalling that, if an i.i.d. set of stable data with, say, \(\alpha =1.7\) is estimated as a location-scale Student’s t model, the resulting estimated degrees of freedom will not be below two, but rather closer to four.
As such, we believe it makes sense to consider several methods of determining the ES, and compare them in terms of portfolio performance.
1.6 A.6 Comparison of Methods
The computation times for estimating the model and evaluating mean and ES for each of the four methods discussed above were compared. Based on a sample size of \(s_1=1e3\), the NCT method requires, on average, 0.20 seconds. The GAt and mixGAt require 0.23 and 1.96 seconds, respectively, while the stable requires 0.00064 seconds. Generation of \(s_1=1e6\) (1e3) samples requires approximately 2769.34 (2.91) seconds, and the empirical calculation of the mean and ES based on \(s_1=1e6\) requires approximately 0.35 seconds. The bottleneck in the generation of samples is the evaluation of the NCT quantile function in (13). In summary, it is fastest to use \(s_1=1e3\) samples and one of the parametric methods to obtain the mean and ES.
We now wish to compare the ES values delivered by each of the methods. For this, we fix the portfolio vector \(\mathbf {w}\) to be equally weighted, and use 100 moving windows of data, each of length 250, and compute, for each method, the ES corresponding to the one-day-ahead predictive distribution and the fixed equally weighted portfolio. All the ES values (the empirically determined ones as well as the parametric ones) are based on (the same) 1e5 replications. The 100 windows have starting dates 8 August 2012 to 31 December 2012 and use the \(d=30\) constituents (as of April 2013) of the Dow Jones Industrial Average index from Wharton/CRSP. The values are shown in Fig. 8, and have been smoothed to enhance visibility. As expected, the stable ES values are larger than those delivered from the t-based models and also the empirically determined ES values. The mixGAt is the most flexible distribution and approximates the empirical ES nearly exactly, though takes the longest time to compute of the four parametric methods.
1.7 A.7 Calibrating the Number of Samples \(\mathbf {s_1}\)
As stated in Sect. 2.6, we wish to determine a heuristic for selecting the number of samples, \(s_1\), from the predictive copula distribution, in order to obtain the ES. This is conducted as follows. The copula model is estimated for all non-overlapping windows of length \(T=250\) based on the 30 components of the DJIA returns available from 4 Jan. 1993 to 31 Dec. 2012 and the ES of the predictive returns distribution for the equally weighted portfolio is computed. The goal is to determine an approximation to the smallest value of \(s_1\), say \(s_1^*\), such that the sampling variance of the ES determined from the parametric methods is less than some threshold. This value \(s_1^*\) is then linked to the tail thickness of the various predictive returns distributions over the non-overlapping windows.
To compute \(s_1^*\) for a particular data set, the ES is calculated \(n=50\) times for a fixed \(s_1\), based on simulation of the predictive returns distribution, and having used the NCT and stable parametric forms for its approximation. This is conducted for a range of \(s_1\) values, and \(s_1^*\) is taken to be the smallest number such that the sample variance is less than a threshold value, For the NCT and stable estimators, Fig. 9 shows the results for selected values of \(s_1\) for the NCT case. As expected, ES variances across rolling windows decrease with \(s_1\) increasing. As can be seen from the middle right panels, a roughly linear relationship is obtained for the logarithm of ES variance. The analysis was also conducted for the stable Paretian distribution, resulting in a similar plot (not shown).
A simple regression approach then yields the following. For a threshold of \(\exp (-2)\),
The resulting procedure is then: From an initial set of 300 copula samples, the ES is evaluated, \(s_1\) is computed from (51), and if \(s_1>300\), an additional \(s_1--300\) samples are drawn.
B The Gaussian DCC-GARCH Model
Consider a d-dimensional vector of asset returns, \(\mathbf {Y}_t = \left( Y_{t,1},Y_{t,2},\ldots ,Y_{t,d} \right) '\). The ith univariate series, \(i=1,\ldots , d\), is assumed to follow a GARCH(1,1) model, which is a special case of (5). We assume an unknown mean \(\mu _i\), so that \(Y_{t,i} - \mu _i = \epsilon _{t,i} = Z_{t,i}\sigma ^2_{t,i}\), \(\sigma ^2_{t,i} = c_{0,i} + c_{1,i} \left( Y_{t-1,i} - \mu _i \right) ^2 + d_{1,i} \sigma ^2_{t-1,i}\), and \(Z_{t,i}\) are i.i.d. standard normal.
1.1 B.1 Estimation Using Profile Likelihood for Each GARCH Margin
The DCC multivariate structure can be expressed as
with \(\varvec{\mu }=(\mu _1, \ldots , \mu _d)'\), \(\mathbf {D}_t^2 = \mathrm{diag}([\sigma ^2_{t,1}, \ldots , \sigma ^2_{t,d} ])\), and \(\{\mathbf {R}_t\}\) the set of \(d\times d\) matrices of time varying conditional correlations with dynamics specified by
\(t=1,\ldots , T\), where \(\varvec{\epsilon }_t = \mathbf {D}^{-1}_t\left( \mathbf {Y}_t-\varvec{\mu }\right) \). The \(\{\mathbf {Q}_t\}\) form a sequence of conditional matrices parameterized by
with \(\mathbf {S}\) the \(d\times d\) unconditional correlation matrix (Engle 2002, p. 341) of the \(\varvec{\epsilon }_{t}\), and parameters a and b are estimated via maximum likelihood conditional on estimates of all other parameters, as discussed next. Matrices \(\mathbf {S}\) and \(\mathbf {Q}_{0}\) can be estimated with the usual plug-in sample correlation based on the filtered \(\varvec{\epsilon }_{t}\); see also Bali and Engle (2010) and Engle and Kelly (2012) on estimation of the DCC model. Observe that the resulting \(\mathbf {Q}_t\) from the update in (54) will not necessarily be precisely a correlation matrix; this is the reason for the standardization in (53). See Caporin and McAleer (2013) for several critiques of this DCC construction; and Aielli (2013) for a modified DCC model, termed cDCC, with potentially better small-sample properties. The CCC model is a special case of (52), with \(a=b=0\) in (54).
The mean vector, \(\varvec{\mu }\), can be set to zero, or estimated using the sample mean of the returns, as in Engle and Sheppard (2001) and McAleer et al. (2008), though in a more general non-Gaussian context, is best estimated jointly with the other parameters associated with each univariate return series; see Paolella and Polak (2017). Let \(\mathbf {Y} = [\mathbf {Y}_1, \ldots , \mathbf {Y}_T]'\), and denote the set of parameters as \(\varvec{\theta }\). The log-likelihood of the remaining parameters, conditional on \(\varvec{\mu }\), is given by
Then, as in Engle (2002), adding and subtracting \(\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t}\), \(\ell \) can be decomposed as the sum of volatility and correlation terms, \(\ell = \ell _V + \ell _C\), where
so that a two-step maximum likelihood estimation procedure can be applied: First, estimate the GARCH model parameters for each univariate returns series and construct the standardized residuals; second, maximize the conditional likelihood with respect to parameters a and b in (54) based on the filtered residuals from the previous step. We now discuss this first step in more detail.
While Francq and Zakoïan (2004) prove the consistency and asymptotic normality of the GARCH model parameters, interest centers on their numeric estimation. Dropping the subscript i, the choice of starting values for \(\hat{c}_0\), \(\hat{c}_1\), and \(\hat{d}_1\) are important, as the log-likelihood can exhibit more than one local maxima. This issue of multiple maxima has been noted by Ma et al. (2006), Winker and Maringer (2009), and Paolella and Polak (2015b), though seems to be often ignored, and can lead to inferior forecasts and jeopardize results in applied work. This unfortunate observation might help explain the results of Brooks et al. (2001, p. 54) in their extensive comparison of econometric software. In particular, they find that, with respect to estimating just the simple normal GARCH model, “the results produced using a default application of several of the most popular econometrics packages differ considerably from one another”. Another reason for discrepant results is the choice of \(\epsilon _0\) and \(\sigma _0\) to start the GARCH(1,1) recursion, for which several suggestions exist in the literature. We take \(\hat{\sigma }^2_0\) to be the sample unconditional variance of the \(R_t\), and \(\hat{\epsilon }^2_0 = \kappa \hat{\sigma }^2_0\), where
depends on the density specification \(f_Z\left( \cdot \right) \) and is stated for the more general APARCH model (5). For \(Z \sim \text {N}(0,1)\), a trivial calculation yields
In our case, with \(\delta =2\) and \(g =0\), this reduces to \(\kappa = {\mathbb E}\big [ \left| Z \right| ^2 \big ] = 1\).
Paolella and Polak (2015b) demonstrate the phenomenon of multiple maxima with a real (and typical) data set, and propose a solution that is simple to implement, making use of the profile log-likelihood (p.l.) obtained by fixing the value of \(c_0\), and using a grid of points of \(c_0\) between zero and 1.1 times the sample variance of the series. That is, for a fixed value of \(c_0\), we compute
To obtain (with high probability) the global maximum, the following procedure suggests itself: (i) Based on a set of \(c_0\) values, compute (56); (ii) take the value of \(c_0\) from the set, say \(c_0^{*}\), and its corresponding \(\widehat{\varvec{\theta }}_{\mathrm{p.l.}}(c_0^{*})\) that results in the largest log-likelihood as starting values, to (iii) estimate the full model. The finer the grid, the higher the probability of reaching the global maximum; some trials suggest that a grid of length 10 is adequate. The use of more parameters, as arise with more elaborate GARCH structures such as the APARCH formulation, or additional shape parameter(s) of a non-Gaussian distribution such as the NCT or stable Paretian, can further exacerbate the problem of multiple local maxima of the likelihood.
1.2 B.2 Remarks on DCC
One might argue that only two parameters for modeling the evolution of an entire correlation matrix will not be adequate. While this is certainly true, the models of Engle (2002) and Tse and Tsui (2002) have two strong points: First, their use is perhaps better than no parameters (as in the CCC model), and second, it allows for easy implementation and estimation. Generalizations of the simple DCC structure that allow the number of parameters to be a function of d, and also introducing asymmetric extensions of the DCC idea, are considered in Engle (2002) and Cappiello et al. (2006), though with a potentially very large number of parameters, the usual estimation and inferential problems arise.
Bauwens and Rombouts (2007) consider an approach in which similar series are pooled into one of a small number of clusters, such that their GARCH parameters are the same within a cluster. A related idea is to group series with respect to their correlations, generalizing the DCC model; see, e.g., Vargas (2006), Billio et al. (2006), Zhou and Chan (2008), Billio and Caporin (2009), Engle and Kelly (2012), So and Yip (2012), Aielli and Caporin (2013), and the references therein.
An alternative approach is to assume a Markov switching structure between two (or more) regimes, each of which has a CCC structure, as first proposed in Pelletier (2006), and augmented to the non-Gaussian case in Paolella et al. (2017). Such a construction implies many additional parameters, but their estimation makes use of the usual sample correlation estimator, thus avoiding the curse of dimensionality, and shrinkage estimation can be straightforwardly invoked to improve performance. The idea that, for a given time segment, the correlations are constant, and take on one set (of usually two, or at most three sets) of values. This appears to be better than attempting to construct a model that allows for their variation at every point in time. The latter might be “asking too much of the data” and inundated with too many parameters. Paolella et al. (2017) demonstrate strong out-of-sample performance of their non-Gaussian Markov switching CCC model with two regimes, compared to the Gaussian CCC case, the Gaussian CCC switching case, the Gaussian DCC model, and the non-Gaussian single component CCC of Paolella and Polak (2015b).
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Paolella, M.S., Polak, P. (2018). COBra: Copula-Based Portfolio Optimization. In: Kreinovich, V., Sriboonchitta, S., Chakpitak, N. (eds) Predictive Econometrics and Big Data. TES 2018. Studies in Computational Intelligence, vol 753. Springer, Cham. https://doi.org/10.1007/978-3-319-70942-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-70942-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70941-3
Online ISBN: 978-3-319-70942-0
eBook Packages: EngineeringEngineering (R0)