Skip to main content

COBra: Copula-Based Portfolio Optimization

  • Conference paper
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 753))

Abstract

The meta-elliptical t copula with noncentral t GARCH univariate margins is studied as a model for asset allocation. A method of parameter estimation is deployed that is nearly instantaneous for large dimensions. The expected shortfall of the portfolio distribution is obtained by combining simulation with a parametric approximation for speed enhancement. A simulation-based method for mean-expected shortfall portfolio optimization is developed. An extensive out-of-sample backtest exercise is conducted and comparisons made with common asset allocation techniques.

M.S. Paolella—Financial support by the Swiss National Science Foundation (SNSF) through project #150277 is gratefully acknowledged.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Aas, K.: Pair-copula constructions for financial applications: a review. Econometrics 4(4), 1–15 (2016). Article 43

    Article  MathSciNet  Google Scholar 

  • Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-Copula Constructions of Multiple Dependence. Insur. Math. Econ. 44, 182–198 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Abdous, B., Genest, C., Rémillard, B.: Dependence Properties of Meta-Elliptical Distributions. In: Duchesne, P., Rémillard, B. (eds.) Statistical Modeling and Analysis for Complex Data Problems. Springer Verlag, New York (2005). Chapter 1

    Google Scholar 

  • Adcock, C.J.: Asset pricing and portfolio selection based on the multivariate extended skew-student-\(t\) distribution. Ann. Oper. Res. 176(1), 221–234 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Adcock, C.J.: Mean-variance-skewness efficient surfaces, Stein’s lemma and the multivariate extended skew-student distribution. Eur. J. Oper. Res. 234(2), 392–401 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Adcock, C.J., Eling, M., Loperfido, N.: Skewed distributions in finance and actuarial science: a preview. Eur. J. Financ. 21(13–14), 1253–1281 (2015)

    Article  Google Scholar 

  • Aielli, G.P.: Dynamic conditional correlation: on properties and estimation. J. Bus. Econ. Stat. 31(3), 282–299 (2013)

    Article  MathSciNet  Google Scholar 

  • Aielli, G.P., Caporin, M.: Fast clustering of GARCH processes via gaussian mixture models. Math. Comput. Simul. 94, 205–222 (2013)

    Article  MathSciNet  Google Scholar 

  • Asai, M.: Heterogeneous asymmetric dynamic conditional correlation model with stock return and range. J. Forecast. 32(5), 469–480 (2013)

    Article  MathSciNet  Google Scholar 

  • Ausin, M.C., Lopes, H.F.: Time-varying joint distribution through copulas. Comput. Stat. Data Anal. 54, 2383–2399 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J.: Pseudo-mathematics and financial charlatanism: the effects of backtest overfitting on out-of-sample performance. Not. Am. Math. Soc. 61(5), 458–471 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J.: The probability of backtest overfitting. J. Comput. Finan. (2016). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2840838

  • Bali, T.G., Engle, R.F.: The intertemporal capital asset pricing model with dynamic conditional correlations. J. Monetary Econ. 57(4), 377–390 (2010)

    Article  Google Scholar 

  • Fundamental Review of the Trading Book: A Revised Market Risk Framework. Consultative document, Bank for International Settlements, Basel (2013)

    Google Scholar 

  • Bauwens, L., Rombouts, J.V.K.: Bayesian clustering of many GARCH models. Econometric Rev. 26(2), 365–386 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Billio, M., Caporin, M.: A generalized dynamic conditional correlation model for portfolio risk evaluation. Math. Comput. Simul. 79(8), 2566–2578 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Billio, M., Caporin, M., Gobbo, M.: Flexible dynamic conditional correlation multivariate GARCH models for asset allocation. Appl. Financ. Econ. Lett. 2(2), 123–130 (2006)

    Article  Google Scholar 

  • Bloomfield, T., Leftwich, R., Long, J.: Portfolio strategies and performance. J. Financ. Econ. 5, 201–218 (1977)

    Article  Google Scholar 

  • Bollerslev, T.: A conditional heteroskedastic time series model for speculative prices and rates of return. Rev. Econ. Stat. 69, 542–547 (1987)

    Article  Google Scholar 

  • Bollerslev, T.: Modeling the coherence in short-run nominal exchange rates: a multivariate Generalized ARCH approach. Rev. Econ. Stat. 72, 498–505 (1990)

    Article  Google Scholar 

  • Broda, S.A., Haas, M., Krause, J., Paolella, M.S., Steude, S.C.: Stable mixture GARCH models. J. Econometrics 172(2), 292–306 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Broda, S. A., Paolella, M. S:. Expected Shortfall for Distributions in Finance. In: Čížek, P., Härdle, W., and Rafał W. (eds.) Statistical Tools for Finance and Insurance (2011)

    Google Scholar 

  • Brooks, C., Burke, S.P., Persand, G.: Benchmarks and the accuracy of GARCH model estimation. Int. J. Forecast. 17(1), 45–56 (2001)

    Article  Google Scholar 

  • Brown, S. J., Hwang, I., In, F.: Why Optimal Diversification Cannot Outperform Naive Diversification: Evidence from Tail Risk Exposure (2013)

    Google Scholar 

  • Bücher, A., Jäschke, S., Wied, D.: Nonparametric tests for constant tail dependence with an application to energy and finance. J. Econometrics 1(187), 154–168 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Cambanis, S., Huang, S., Simons, G.: On the theory of elliptically contoured distributions. J. Multivar. Anal. 11(3), 368–385 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Caporin, M., McAleer, M.: Ten things you should know about the dynamic conditional correlation representation. Econometrics 1(1), 115–126 (2013)

    Article  Google Scholar 

  • Cappiello, L., Engle, R.F., Sheppard, K.: Asymmetric dynamics in the correlations of global equity and bond returns. J. Financ. Econometrics 4(4), 537–572 (2006)

    Article  Google Scholar 

  • Chicheportiche, R., Bouchaud, J.-P.: The joint distribution of stock returns is not elliptical. Int. J. Theor. Appl. Financ. 15(3), 1250019 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Christoffersen, P., Errunza, V., Jacobs, K., Langlois, H.: Is the potential for international diversification disappearing? a dynamic copula approach. Rev. Financ. Stud. 25, 3711–3751 (2012)

    Article  Google Scholar 

  • Clare, A., O’Sullivan, N., and Sherman, M.: Benchmarking UK mutual fund performance: the random portfolio experiment. Int. J. Financ. (2015). https://www.ucc.ie/en/media/research/centreforinvestmentresearch/RandomPortfolios.pdf

  • Demarta, S., McNeil, A.J.: The \(t\) copula and related copulas. Int. Stat. Rev. 73(1), 111–129 (2005)

    Article  MATH  Google Scholar 

  • DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is the \(1/N\) portfolio strategy? Rev. Financ. Stud. 22(5), 1915–1953 (2009)

    Article  Google Scholar 

  • DeMiguel, V., Martin-Utrera, A., Nogales, F.J.: Size matters: optimal calibration of shrinkage estimators for portfolio selection. J. Bank. Financ. 37(8), 3018–3034 (2013)

    Article  Google Scholar 

  • Devroye, L.: Non-Uniform Random Variate Generation. Springer Verlag, New York (1986)

    Book  MATH  Google Scholar 

  • Ding, P.: On the conditional distribution of the multivariate \(t\) distribution. Am. Stat. 70(3), 293–295 (2016)

    Article  MathSciNet  Google Scholar 

  • Ding, Z., Granger, C.W.J., Engle, R.F.: A long memory property of stock market returns and a new model. J. Empir. Financ. 1(1), 83–106 (1993)

    Article  Google Scholar 

  • Edwards, T., Lazzara, C.J.: Equal-Weight Benchmarking: Raising the Monkey Bars. Technical report, McGraw Hill Financial (2014)

    Google Scholar 

  • Embrechts, P.: Copulas: a personal view. J. Risk Insur. 76, 639–650 (2009)

    Article  Google Scholar 

  • Embrechts, P., McNeil, A., Straumann, D.: Correlation and dependency in risk management: properties and pitfalls. In: Dempster, M.A.H. (ed.) Risk Management: Value at Risk and Beyond, pp. 176–223. Cambridge University Press, Cambridge (2002)

    Chapter  Google Scholar 

  • Engle, R.: Anticipating Correlations: A New Paradigm for Risk Management. Princeton University Press, Princeton (2009)

    Book  Google Scholar 

  • Engle, R., Kelly, B.: Dynamic equicorrelation. J. Bus. Econ. Stat. 30(2), 212–228 (2012)

    Article  MathSciNet  Google Scholar 

  • Engle, R.F.: Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J. Bus. Econ. Stat. 20, 339–350 (2002)

    Article  MathSciNet  Google Scholar 

  • Engle, R.F., Sheppard, K.: Theoretical and Empirical Properties of Dynamic Conditional Correlation Multivariate GARCH. NBER Working Papers 8554, National Bureau of Economic Research Inc (2001)

    Google Scholar 

  • Fang, H.B., Fang, K.T., Kotz, S.: The meta-elliptical distribution with given marginals. J. Multivar. Anal. 82, 1–16 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Fang, K.-T., Kotz, S., Ng, K.-W.: Symmetric Multivariate and Related Distributions. Chapman & Hall, London (1989)

    MATH  Google Scholar 

  • Fink, H., Klimova, Y., Czado, C., Stöber, J.: Regime switching vine copula models for global equity and volatility indices. Econometrics 5(1), 1–38 (2017). Article 3

    Article  Google Scholar 

  • Francq, C., Zakoïan, J.-M.: Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli 10(4), 605–637 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Francq, C., Zakoïan, J.-M.: GARCH Models: Structure Statistical Inference and Financial Applications. John Wiley & Sons Ltd., Chichester (2010)

    Book  MATH  Google Scholar 

  • Gambacciani, M., Paolella, M.S.: Robust normal mixtures for financial portfolio allocation. Forthcoming. In: Econometrics and Statistics (2017)

    Google Scholar 

  • Haas, M., Krause, J., Paolella, M.S., Steude, S.C.: Time-varying mixture GARCH models and asymmetric volatility. North Am. J. Econ. Financ. 26, 602–623 (2013)

    Article  Google Scholar 

  • Haas, M., Mittnik, S., Paolella, M.S.: Mixed normal conditional heteroskedasticity. J. Financ. Econometrics 2(2), 211–250 (2004)

    Article  Google Scholar 

  • He, C., Teräsvirta, T.: Properties of moments of a family of GARCH processes. J. Econometrics 92(1), 173–192 (1999a)

    Article  MathSciNet  MATH  Google Scholar 

  • He, C., Teräsvirta, T.: Statistical properties of the asymmetric power ARCH model. In: Engle, R.F., White, H. (eds) Cointegration, Causality, and Forecasting. Festschrift in Honour of Clive W. J. Granger, pp. 462–474. Oxford University Press (1999b)

    Google Scholar 

  • Heyde, C.C., Kou, S.G.: On the controversy over tailweight of distributions. Oper. Res. Lett. 32, 399–408 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Hough, J.: Monkeys are better stockpickers than you’d think. Barron’s magazine (2014)

    Google Scholar 

  • Hurst, S.: The characteristic function of the student \(t\) distribution. Financial Mathematics Research Report FMRR006-95, Australian National University, Canberra (1995). http://wwwmaths.anu.edu.au/research.reports/srr/95/044/

  • Jagannathan, R., Ma, T.: Risk reduction in large portfolios: why imposing the wrong constraints helps. J. Financ. 58(4), 1651–1683 (2003)

    Article  Google Scholar 

  • Jondeau, E.: Asymmetry in tail dependence of equity portfolios. Computat. Stat. Data Anal. 100, 351–368 (2016)

    Article  MathSciNet  Google Scholar 

  • Jondeau, E., Rockinger, M.: Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. J. Econ. Dyn. Control 27, 1699–1737 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Jondeau, E., Rockinger, M.: The Copula-GARCH model of conditional dependencies: an international stock market application. J. Int. Money Financ. 25, 827–853 (2006)

    Article  Google Scholar 

  • Jondeau, E., Rockinger, M.: On the importance of time variability in higher moments for asset allocation. J. Financ. Econometrics 10(1), 84–123 (2012)

    Article  Google Scholar 

  • Karanasos, M., Kim, J.: A re-examination of the asymmetric power ARCH model. J. Empir. Financ. 13, 113–128 (2006)

    Article  Google Scholar 

  • Kelker, D.: Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhyā, Series A 32(4), 419–430 (1970)

    MathSciNet  MATH  Google Scholar 

  • Kiefer, J., Wolfowitz, J.: Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27(4), 887–906 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  • Kogon, S.M., Williams, D.B.: Characteristic function based estimation of stable parameters. In: Adler, R.J., Feldman, R.E., Taqqu, M.S. (eds) A Practical Guide to Heavy Tails, pp. 311–335. Birkhauser Boston Inc. (1998)

    Google Scholar 

  • Krause, J., Paolella, M.S.: A fast, accurate method for value at risk and expected shortfall. Econometrics 2, 98–122 (2014)

    Article  Google Scholar 

  • Kuester, K., Mittnik, S., Paolella, M.S.: Value-at-risk prediction: a comparison of alternative strategies. J. Financ. Econometrics 4, 53–89 (2006)

    Article  Google Scholar 

  • Ling, S., McAleer, M.: Necessary and sufficient moment conditions for the garch(\(r, s\)) and asymmetric power garch(\(r, s\)) models. Econometric Theor. 18(3), 722–729 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, J., Nelson, C.R., Startz, R.: Spurious inference in the GARCH(1,1) model when it is weakly identified. Stud. Nonlinear Dyn. Econometrics 11(1), 1–27 (2006). Article 1

    MATH  Google Scholar 

  • Markowitz, H.: Portfolio Selection. J. Financ. 7(1), 77–91 (1952)

    Google Scholar 

  • McAleer, M., Chan, F., Hoti, S., Lieberman, O.: Generalized autoregressive conditional correlation. Econometric Theor. 24(6), 1554–1583 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton (2005)

    MATH  Google Scholar 

  • McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton (2015). Revised edition

    Google Scholar 

  • Mittnik, S., Paolella, M.S.: Prediction of financial downside risk with heavy tailed conditional distributions. In: Rachev, S.T. (ed.) Handbook of Heavy Tailed Distributions in Finance. Elsevier Science, Amsterdam (2003)

    Google Scholar 

  • Mittnik, S., Paolella, M.S., Rachev, S.T.: Stationarity of stable power-GARCH processes. J. Econometrics 106, 97–107 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen, H.T.: On evidential measures of support for reasoning with integrate uncertainty: a lesson from the ban of P-values in statistical inference. In: Huynh, V.-N., Inuiguchi, M., Le, B., Le, B.N., Denoeux, T. (eds.) 5th International Symposium on Integrated Uncertainty in Knowledge Modeling and Decision Making IUKM 2016, pp. 3–15. Springer, Cham (2016)

    Google Scholar 

  • Nolan, J. P.: Stable Distributions - Models for Heavy Tailed Data. Birkhäuser, Boston (2015, forthcoming). Chapter 1 online

    Google Scholar 

  • Paolella, M.S.: Intermediate Probability: A Computational Approach. John Wiley & Sons, Chichester, West Sussex, England (2007)

    Book  MATH  Google Scholar 

  • Paolella, M.S.: Multivariate asset return prediction with mixture models. Eur. J. Financ. 21, 1–39 (2013)

    Google Scholar 

  • Paolella, M.S.: Fast methods for large-scale non-elliptical portfolio optimization. Ann. Financ. Econ. 09(02), 1440001 (2014)

    Article  Google Scholar 

  • Paolella, M.S.: Stable-GARCH models for financial returns: fast estimation and tests for stability. Econometrics 4(2), 25 (2016). Article 25

    Article  MathSciNet  Google Scholar 

  • Paolella, M.S.: The univariate collapsing method for portfolio optimization. Econometrics 5(2), 1–33 (2017). Article 18

    Article  MathSciNet  Google Scholar 

  • Paolella, M.S., Polak, P.: ALRIGHT: Asymmetric LaRge-Scale (I)GARCH with hetero-tails. Int. Rev. Econ. Financ. 40, 282–297 (2015a)

    Google Scholar 

  • Paolella, M.S., Polak, P.: COMFORT: A common market factor non-gaussian returns model. J. Econometrics 187(2), 593–605 (2015b)

    Google Scholar 

  • Paolella, M.S., Polak, P.: Portfolio Selection with Active Risk Monitoring. Research paper, Swiss Finance Institute (2015c)

    Google Scholar 

  • Paolella, M.S., Polak, P.: Density and Risk Prediction with Non-Gaussian COMFORT Models (2017). Submitted

    Google Scholar 

  • Paolella, M.S., Polak, P., Walker, P.: A Flexible Regime-Switching Model for Asset Returns (2017). Submitted

    Google Scholar 

  • Patton, A.J.: A review of copula models for economic time series. J. Multivar. Anal. 110, 4–18 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Pelletier, D.: Regime switching for dynamic correlations. J. Econometrics 131, 445–473 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Righi, M.B., Ceretta, P.S.: Individual and flexible expected shortfall backtesting. J. Risk Model Valid. 7(3), 3–20 (2013)

    Article  Google Scholar 

  • Righi, M.B., Ceretta, P.S.: A comparison of expected shortfall estimation models. J. Econ. Bus. 78, 14–47 (2015)

    Article  Google Scholar 

  • Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, London (1994)

    MATH  Google Scholar 

  • Scherer, M.: CDO pricing with nested archimedean copulas. Quant. Financ. 11, 775–787 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Shaw, W.T.: Monte Carlo Portfolio Optimization for General Investor Risk-Return Objectives and Arbitrary Return Distributions: a Solution for Long-only Portfolios (2010)

    Google Scholar 

  • So, M.K.P., Yip, I.W.H.: Multivariate GARCH models with correlation clustering. J. Forecast. 31(5), 443–468 (2012)

    Article  MathSciNet  Google Scholar 

  • Song, D.-K., Park, H.-J., Kim, H.-M.: A note on the characteristic function of multivariate \(t\) distribution. Commun. Stat. Appl. Methods 21(1), 81–91 (2014)

    MATH  Google Scholar 

  • Stoyanov, S., Samorodnitsky, G., Rachev, S., Ortobelli, S.: Computing the portfolio conditional value-at-risk in the alpha-stable case. Probab. Math. Statistics 26, 1–22 (2006)

    MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C.: On the characteristic function of multivariate student \(t\)-distribution. Can. J. Stat. 14(4), 329–337 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Tse, Y.K., Tsui, A.K.C.: A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. J. Bus. Econ. Stat. 20(3), 351–362 (2002)

    Article  MathSciNet  Google Scholar 

  • Vargas, G.A.: An asymmetric block dynamic conditional correlation multivariate GARCH model. Philippine Stat. 55(1–2), 83–102 (2006)

    Google Scholar 

  • Winker, P., Maringer, D.: The convergence of estimators based on heuristics: theory and application to a GARCH model. Comput. Stat. 24(3), 533–550 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Wolf, O.L.M.: Honey, I shrunk the sample covariance matrix: problems in mean-variance optimization. J. Portfolio Management 30(4), 110–119 (2004)

    Article  Google Scholar 

  • Zhou, T., Chan, L.: Clustered dynamic conditional correlation multivariate garch model. In: Song, I.-Y., Eder, J., Nguyen, T. M. (eds) Proceedings of the 10th International Conference Data Warehousing and Knowledge Discovery, DaWaK 2008, Turin, Italy, 2–5 September 2008, pp. 206–216 (2008)

    Google Scholar 

  • Zolotarev, V.M.: One Dimensional Stable Distributions (Translations of Mathematical Monograph, Vol. 65). American Mathematical Society, Providence, RI (1986). Translated from the original Russian verion (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc S. Paolella .

Editor information

Editors and Affiliations

Appendices

A Parametric Forms for Approximating the Distribution of \(\widetilde{\mathbf {R}}_{P}\)

We detail here the four candidate parametric structures mentioned in Sect. 2.6.

1.1 A.1 The Noncentral Student’s t

The first is the location-scale \(\mathrm {NCT}^{*}\) distribution (3). As location \(\mu \) and scale \(\sigma \) parameters need to be estimated along with the \(\mathrm {NCT}^{*}\) shape parameters, we compute

$$\begin{aligned} \arg \max _{\mu , \sigma } f_{\mathrm{NCT}} \Big ({{P}}_{t+1 \mid t, \mathbf {w}} ; \widetilde{\nu }, \widetilde{\gamma }, \mu , \sigma \Big ), \quad \widetilde{\nu }, \widetilde{\gamma } = \mathrm{KP} \Big ( {{Z}}_{t+1 \mid t, \mathbf {w}} \Big ), \quad {{Z}}_{t+1 \mid t, \mathbf {w}} = \frac{{{P}}_{t+1 \mid t, \mathbf {w}} - \mu }{\sigma }. \end{aligned}$$
(42)

Starting values are taken to be the 50% trimmed mean for \(\mu \) (i.e., the lower and upper 25% of the sorted sample are ignored) and, using (6) with \(\nu =4\) and \(\gamma =0\), gives \((s^2/2)^{1/2}\) for \(\sigma \), where \(s^2\) denotes the sample variance. Two box constraints \(q_{0.25}< \widehat{\mu }<q_{0.75}\) and \((s^2/10)^{1/2}< \widehat{\sigma } < s\) are imposed during estimation, where \(q_{\xi }\) denotes the \(\xi \)th sample quantile. The mean and variance are then determined from (6), while the ES is, via a table-lookup procedure, given essentially instantaneously from the KP method, noting that, for any probability \(0<\xi <1\), \(\mathrm{ES}({{P}}_{t+1 \mid t, \mathbf {w}}; \xi ) = \mu + \sigma \mathrm{ES}({{Z}}_{t+1 \mid t, \mathbf {w}}; \xi )\).

1.2 A.2 The Generalized Asymmetric t

The second candidate is the five-parameter generalized asymmetric t, or GAt distribution. The pdf is

$$\begin{aligned} f_{\mathrm{GA}t}(z;d,\nu ,\theta ) = K \times \left\{ \begin{array}{ll} \Bigg ( 1+\dfrac{(-z \theta )^d}{\nu }\Bigg )^{-(\nu + 1{/}d)}, &{} \text {if }z<0, \\ \Bigg ( 1+\dfrac{( z/\theta )^d}{\nu }\Bigg )^{-(\nu + 1{/}d)}, &{} \text {if }z\ge 0, \end{array} \right. \end{aligned}$$
(43)

where \(d,\nu ,\theta \in {\mathbb R}_{> 0}\), and \(K^{-1}=(\theta ^{-1} + \theta ) d^{-1} \nu ^{1{/}d} B(1{/}d,\nu )\). It is noteworthy because limiting cases include the generalized exponential (GED), and hence the Laplace and normal, while the Student’s t (and, thus, the Cauchy) distributions are special cases. For \(\theta >1\) (\(\theta <1\)) the distribution is skewed to the right (left), while for \(\theta =1\), it is symmetric. See Paolella (2007, p. 273) for further details. The rth moment for integer r such that \(0 \le r < \nu d\) is

$$\begin{aligned} {\mathbb E}\big [Z^r\big ] = \frac{I_1+I_2}{K^{-1}} = \frac{(-1)^r \theta ^{-(r+1)} + \theta ^{r+1}}{\theta ^{-1}+\theta }\frac{ \,B\big ((r+1)/d,\nu -r/d\big ) }{B\big (1{/}d,\nu \big )} \nu ^{r/d}, \end{aligned}$$

i.e., the mean is

$$\begin{aligned} {\mathbb E}\big [Z\big ] = \frac{\theta ^2 -\theta ^{-2}}{\theta ^{-1}+\theta }\frac{\,B\big (2/d,\nu -1{/}d\big )}{B\big (1{/}d,\nu \big )} \nu ^{1{/}d} \end{aligned}$$
(44)

when \(\nu d >1\), and the variance is computed in the obvious way. The cumulative distribution function (cdf) of \(Z \sim \mathrm{GA}t(d,\nu ,\theta )\) is

(45)

where is the incomplete beta ratio,

$$\begin{aligned} L=\frac{\nu }{\nu +\big (-z\theta \big )^d}, \quad \text {and} \quad U=\frac{\big (z/\theta \big )^d}{\nu +\big (z/\theta \big )^d}. \end{aligned}$$

For computing the ES, we require \({\mathbb E}[Z^r\mid Z<c]\) for \(r=1\). For \(c<0\), this is given by

$$\begin{aligned} S_r(c)=(-1)^r\nu ^{r/d}\frac{\big (1+\theta ^2\big )}{\big (\theta ^r+\theta ^{r+2}\big )}\frac{B_L\big (\nu -r/d,(r+1)/d\big )}{B_L\big (\nu ,1{/}d\big ),},\quad L=\frac{\nu }{\nu (-c\theta )^d}. \end{aligned}$$
(46)

The existence of the mean and the ES requires \(\nu d >1\).

1.3 A.3 The Two-Component Mixture GAt

With five parameters (including location and scale), the GAt is a rather flexible distribution. However, as our third choice, greater accuracy can be obtained by using a two-component mixture of GAt, with mixing parameters \(0<\lambda _1<1\) and \(\lambda _2=1-\lambda _1\). This 11 parameter construction is extraordinarily flexible, and should be quite adequate for modeling the portfolio distribution. We also assume that the true distribution is not (single component) GAt, and that the distributional class of two-component mixtures of GAt is identified. Its pdf and cdf are just weighted sums of GAt pdfs and cdfs respectively, so that evaluation of the cdf is no more involved than that of the GAt. Let P denote a K-component mixGAt distribution, where each component has the three aforementioned shape parameters, as well as location \(u_i\) and scale \(c_i\), \(i=1,\dots , K\). First observe that the cdf of the mixture is given by

$$\begin{aligned} F_P(z) = \sum _{j=1}^K \lambda _j F_{\mathrm{Z}_j} \bigg (\frac{z-u_j}{c_j}; d_j, \nu _j, \theta _j \bigg ), \quad 0<\lambda _j<1, \quad \sum _{j=1}^K \lambda _j = 1, \end{aligned}$$
(47)

where the ith cdf mixture component is given as the closed-form expression in (45), so that a quantile can be found by simple one-dimensional root searching. Similar to calculations for the ES of mixture distributions in Broda and Paolella (2011), the ES of the mixture is given by

$$\begin{aligned} \mathrm{ES}_{\xi }(P)&= \frac{1}{\xi }\int _{-\infty }^{q_{P,\xi }} x f_{P}(x) \hbox {d}x = \frac{1}{\xi }\sum _{j=1}^K\lambda _j\int _{-\infty }^{q_{P,\xi }} x c_{j}^{-1} f_{Z_j}\bigg (\frac{x-u_j}{c_j}\bigg ) \hbox {d}x \nonumber \\&= \frac{1}{\xi } \sum _{j=1}^{K}\lambda _j\int _{-\infty }^{\frac{q_{P,\xi }-u_j}{c_j}} (c_jz+u_j)c_j^{-1} f_{Z_j}(z)c_j \hbox {d}z \nonumber \\&= \frac{1}{\xi } \sum _{j=1}^{K}\lambda _j\Bigg [c_j\int _{-\infty }^{\frac{q_{P,\xi }-u_j}{c_j}} z f_{Z_j}(z) \hbox {d}z+u_j\int _{-\infty }^{\frac{q_{P,\xi }-u_j}{c_j}} f_{Z_j}(z) \hbox {d}z \Bigg ]\nonumber \\&= \frac{1}{\xi } \sum _{j=1}^{K}\lambda _j\Bigg [c_j S_{1,Z_j}\bigg (\frac{q_{P,\xi }-u_j}{c_j}\bigg ) + u_j F_{Z_j}\bigg (\frac{q_{P,\xi }-u_j}{c_j}\bigg ) \Bigg ], \end{aligned}$$
(48)

where \(q_{P,\xi }\) is the \(\xi \)-quantile of P, \(S_{1,Z_j}\) is given in (46), and \(F_{Z_j}\) is the cdf of the GAt random variable given in (45), both functions evaluated with the parameters \(d_j\), \(\nu _j\), and \(\theta _j\) from the mixture components, \(Z_j\), for \(j=1,\dots ,K\).

While estimation of the two-component mixture GAt is straightforward using standard ML estimation, it was found that this occasionally resulted in an inferior, possibly bi-modal fit that optically did not agree well with a kernel-density estimate. This artefact arises from the nature of mixture distributions and the problems associated with the likelihood. We present a method that leads, with far higher probability, to a successful model fit, based on a so-called augmented likelihood procedure. The technique was first presented in Broda et al. (2013) and is adapted for the mixture GAt as follows.

Let \(f(x; \varvec{\theta })=\sum _{i=1}^K \lambda _i f_i(x; \varvec{\theta }_i)\) be the univariate pdf of a K-component (finite) mixture distribution with component weights \(\lambda _1, \ldots , \lambda _K\) positive and summing to one. The likelihood function is

$$\begin{aligned} \ell ^{\star }(\varvec{\theta }; \mathbf {x}) = \sum _{t=1}^T \log \sum _{i=1}^K \lambda _i f_i( x_t; \varvec{\theta }_i), \end{aligned}$$
(49)

where \(\mathbf {x}=(x_1,\dots ,x_T)'\) is the sequence of evaluation points, and \(\varvec{\theta } = (\varvec{\lambda }, \varvec{\theta }_1, \dots , \varvec{\theta }_K)'\) is the vector of all model parameters. Assuming that the \(\varvec{\theta }_i\) include location and scale parameters, \(\ell ^{\star }\) is plagued with “spikes”—it is an unbounded function with multiple maxima, see, e.g., Kiefer and Wolfowitz (1956). Hence, numerical maximization of (49) is prone (depending on factors like starting values and the employed numerical optimization method) to result in inaccurate, if not arbitrary, estimates. To avoid this problem, an augmented likelihood function is proposed in Broda et al. (2013). The idea is to remove unbounded states from the likelihood function by introducing a smoothing (shrinkage) term that, at maximum, drives all components to act as one (irrespective of their assigned mixing weight) such that the mixture loses its otherwise inherently large flexibility. The suggested augmented likelihood function is given by

$$\begin{aligned} \widetilde{\ell }(\varvec{\theta };\mathbf {x}) = \ell ^{\star }(\varvec{\theta };\mathbf {x}) + \kappa \sum _{i=1}^K \frac{1}{T} \sum _{t=1}^T \log f_i(x_t;\varvec{\theta }_i), \end{aligned}$$
(50)

where \(\kappa \), \(\kappa \ge 0\), controls the shrinkage strength. If all component densities \(f_i\) are of the same type, larger values of \(\kappa \) lead to more similar parameter estimates across components, with identical estimates in the limit, as \(\kappa \rightarrow \infty \). At \(\kappa =0\), (50) reduces to (49). The devised estimator,

$$\begin{aligned} \widehat{\varvec{\theta }}_{\text {ALE}} = \arg \max _{\varvec{\theta }} \widetilde{\ell }(\varvec{\theta };\mathbf {x}), \end{aligned}$$

is termed the augmented likelihood estimator (ALE) and is asymptotically consistent, as \(T \rightarrow \infty \). By changing \(\kappa \), smooth density estimates can be enforced, even for small sample sizes. For mixGAt with \(K=2\) and 250 observations, we obtain \(\kappa =10\) as an adequate choice, which, in our empirical testing, guaranteed unimodal estimates in all cases, while still offering enough flexibility for accurate density fits, significantly better than those obtained with the single component GAt.

1.4 A.4 The Asymmetric Stable Paretian

The fourth candidate we consider is the use of the asymmetric non-Gaussian stable Paretian distribution, hereafter stable, with location \(\mu \), scale c, tail index \(\alpha \), and asymmetry parameter \(\beta \). We use the parametrization such that the mean, assuming \(\alpha >1\), is given by \(\mu \). (In Nolan 2015, and the use of his software, this corresponds to his first parametrization; see also Zolotarev 1986; and Samorodnitsky and Taqqu 1994.)

This might at first seem like an odd candidate, given the historical difficulties in its estimation and the potentially problematic calculation of the ES, given the extraordinary heavy-tailed nature of the distribution and the problems associated with the calculation of the density far into the tails; see, e.g., Paolella (2016) and the references therein. We circumvent both of these issues as follows. We make use of the estimator based on the sample characteristic function of Kogon and Williams (1998), which is fast to calculate, and results in estimates that are very close in performance to the MLE. We use the function provided in John Nolan’s STABLE toolbox, saving us the implementation, and easily confirming via simulation that his procedure is correct (and very fast). For the ES calculation, we first need the appropriate quantile, which is also implemented in Nolan’s toolbox. The ES integral can then be computed using the integral expression given in Stoyanov et al. (2006), which cleverly avoids integration into the tail.

This procedure, while feasible, is still too time consuming for our purposes. Instead, we use the same procedure employed in Krause and Paolella (2014) to generate a (massive) table in two dimensions (\(\alpha \) and \(\beta \)) to deliver the VaR (the required quantile) and the ES, essentially instantaneously and with very high accuracy. There is one caveat with its use that requires remedying. It is well-known, and as simulations quickly verify, that estimation of the asymmetry parameter \(\beta \) is subject to the most variation, for any particular sample size. The nature of the stable distribution, with its extremely heavy tails, relative to asymmetric Student’s t distributions, will induce observations in small samples that have a relatively large impact on the estimation of \(\beta \). This is particularly acute when using a relatively small sample size of \(T=250\). As such, we recommend use of a simple shrinkage estimator, with target zero and weight \(s_{\beta }\), namely delivering \(\widehat{\beta } = s_{\beta } \widehat{\beta }_{\mathrm{MLE}}\). Some trial and error suggests \(s_{\beta }=0.3\) to be a reasonable choice for \(T=250\).

The motivation for using the stable is the conservative nature of the delivered ES. In particular, the first three methods we discussed are all based on asymmetric variations of the Student’s t distribution which, while clearly heavy-tailed (it does not possess a moment generating function on an open neighborhood around zero), still potentially possesses a variance; as opposed to the stable, except in the measure-zero case of \(\alpha =2\). As such, and because estimation is based on a finite amount of data, the ES delivered from the stable will be expected to be larger than those from the t-based models. This might be desirable when more conservative estimates of risk should be used, and will also be expected to affect the optimized portfolio vectors and the performance of the method.

1.5 A.5 Discussion of Portfolio Tail Behavior and ES

It is worth mentioning that the actual tail behavior of financial assets is not necessarily heavy-tailed; the discussion in Heyde and Kou (2004) should settle this point. This explains why, on the one hand, exponential-tailed distributions, such as the mixed normal, can deliver excellent VaR predictions; see, e.g., Haas et al. (2004), Haas et al. (2013), and Paolella (2013); while, on the other hand, stable-Paretian GARCH models also work admirably well; see e.g., Mittnik et al. (2002) and Mittnik and Paolella (2003).

Further, observe that the tail behavior associated with the \({{P}}_{t+1 \mid t, \mathbf {w}}\), given the model and the parameters, is not subject to debate: by the nature of the model we employ, it involves convolutions of (dependent) random variables with power tails, and, as such, will also have power tails, and will (presumably) be in the domain of attraction of a stable law. It is, however, analytically intractable. Observe that it is fallacious to argue that, as our model involves use of the (noncentral) Student’s t, with estimated degrees of freedom parameters (after application of the APARCH filter) above two, the convolution will have a finite variance, and so the stable distribution cannot be considered. It is crucial to realize first that the model we employ is wrong w.p.1 (and also subject to estimation error) and, second, recalling that, if an i.i.d. set of stable data with, say, \(\alpha =1.7\) is estimated as a location-scale Student’s t model, the resulting estimated degrees of freedom will not be below two, but rather closer to four.

As such, we believe it makes sense to consider several methods of determining the ES, and compare them in terms of portfolio performance.

1.6 A.6 Comparison of Methods

The computation times for estimating the model and evaluating mean and ES for each of the four methods discussed above were compared. Based on a sample size of \(s_1=1e3\), the NCT method requires, on average, 0.20 seconds. The GAt and mixGAt require 0.23 and 1.96 seconds, respectively, while the stable requires 0.00064 seconds. Generation of \(s_1=1e6\) (1e3) samples requires approximately 2769.34 (2.91) seconds, and the empirical calculation of the mean and ES based on \(s_1=1e6\) requires approximately 0.35 seconds. The bottleneck in the generation of samples is the evaluation of the NCT quantile function in (13). In summary, it is fastest to use \(s_1=1e3\) samples and one of the parametric methods to obtain the mean and ES.

We now wish to compare the ES values delivered by each of the methods. For this, we fix the portfolio vector \(\mathbf {w}\) to be equally weighted, and use 100 moving windows of data, each of length 250, and compute, for each method, the ES corresponding to the one-day-ahead predictive distribution and the fixed equally weighted portfolio. All the ES values (the empirically determined ones as well as the parametric ones) are based on (the same) 1e5 replications. The 100 windows have starting dates 8 August 2012 to 31 December 2012 and use the \(d=30\) constituents (as of April 2013) of the Dow Jones Industrial Average index from Wharton/CRSP. The values are shown in Fig. 8, and have been smoothed to enhance visibility. As expected, the stable ES values are larger than those delivered from the t-based models and also the empirically determined ES values. The mixGAt is the most flexible distribution and approximates the empirical ES nearly exactly, though takes the longest time to compute of the four parametric methods.

Fig. 8.
figure 8

Comparison of five methods of estimating ES, as discussed above, for a sequence of 100 rolling windows and based on the equally weighted portfolio. Each point was calculated based on (across the methods, the same) 1e5 replications. Left: The 100 values, for each method, plotted as a function of time. Right: The deviations of the four parametric methods from the empirical one.

Fig. 9.
figure 9

Upper left: Percentage log returns of the equally weighted portfolio as used in the other panels. Mid and lower left: Boxplots of 1% ES values obtained from 50 simulations based on \(s_1\) draws from the fitted copula for different non-overlapping rolling windows of size 250, spanning Jan 4, 1993, to Dec 31, 2012. Timestamps denote the most recent date included in a data window. All values are obtained via the NCT estimator. Upper right: Boxplots of 1% ES values sorted in descending order by the average ES value, overlayed by the average of the estimated degrees of freedom parameters. Mid right: ES variances in log scale across rolling windows for different samples sizes \(s_1\), sorted by the average ES value per window. Lower right: Linear approximation of the above panel using ordinary least squares regression, overlayed by another linear approximation for the estimated degrees of freedom for the largest sample size \(s_1=3,200\) under study.

1.7 A.7 Calibrating the Number of Samples \(\mathbf {s_1}\)

As stated in Sect. 2.6, we wish to determine a heuristic for selecting the number of samples, \(s_1\), from the predictive copula distribution, in order to obtain the ES. This is conducted as follows. The copula model is estimated for all non-overlapping windows of length \(T=250\) based on the 30 components of the DJIA returns available from 4 Jan. 1993 to 31 Dec. 2012 and the ES of the predictive returns distribution for the equally weighted portfolio is computed. The goal is to determine an approximation to the smallest value of \(s_1\), say \(s_1^*\), such that the sampling variance of the ES determined from the parametric methods is less than some threshold. This value \(s_1^*\) is then linked to the tail thickness of the various predictive returns distributions over the non-overlapping windows.

To compute \(s_1^*\) for a particular data set, the ES is calculated \(n=50\) times for a fixed \(s_1\), based on simulation of the predictive returns distribution, and having used the NCT and stable parametric forms for its approximation. This is conducted for a range of \(s_1\) values, and \(s_1^*\) is taken to be the smallest number such that the sample variance is less than a threshold value, For the NCT and stable estimators, Fig. 9 shows the results for selected values of \(s_1\) for the NCT case. As expected, ES variances across rolling windows decrease with \(s_1\) increasing. As can be seen from the middle right panels, a roughly linear relationship is obtained for the logarithm of ES variance. The analysis was also conducted for the stable Paretian distribution, resulting in a similar plot (not shown).

A simple regression approach then yields the following. For a threshold of \(\exp (-2)\),

(51)

The resulting procedure is then: From an initial set of 300 copula samples, the ES is evaluated, \(s_1\) is computed from (51), and if \(s_1>300\), an additional \(s_1--300\) samples are drawn.

B The Gaussian DCC-GARCH Model

Consider a d-dimensional vector of asset returns, \(\mathbf {Y}_t = \left( Y_{t,1},Y_{t,2},\ldots ,Y_{t,d} \right) '\). The ith univariate series, \(i=1,\ldots , d\), is assumed to follow a GARCH(1,1) model, which is a special case of (5). We assume an unknown mean \(\mu _i\), so that \(Y_{t,i} - \mu _i = \epsilon _{t,i} = Z_{t,i}\sigma ^2_{t,i}\), \(\sigma ^2_{t,i} = c_{0,i} + c_{1,i} \left( Y_{t-1,i} - \mu _i \right) ^2 + d_{1,i} \sigma ^2_{t-1,i}\), and \(Z_{t,i}\) are i.i.d. standard normal.

1.1 B.1 Estimation Using Profile Likelihood for Each GARCH Margin

The DCC multivariate structure can be expressed as

$$\begin{aligned} \mathbf {Y}_{t\vert t-1} \sim \text {N}_d(\varvec{\mu }, \mathbf {H}_t) , \quad \mathbf {H}_t=\mathbf {D}_t \mathbf {R}_t \mathbf {D}_t, \end{aligned}$$
(52)

with \(\varvec{\mu }=(\mu _1, \ldots , \mu _d)'\), \(\mathbf {D}_t^2 = \mathrm{diag}([\sigma ^2_{t,1}, \ldots , \sigma ^2_{t,d} ])\), and \(\{\mathbf {R}_t\}\) the set of \(d\times d\) matrices of time varying conditional correlations with dynamics specified by

$$\begin{aligned} \mathbf {R}_t = {\mathbb E}_{t-1} \left[ \varvec{\epsilon }_t \varvec{\epsilon }_t' \right] = \mathrm{diag} \big ( \mathbf {Q}_{t} \big )^{-1/2} \mathbf {Q}_{t} \mathrm{diag} \big ( \mathbf {Q}_{t} \big )^{-1/2}, \end{aligned}$$
(53)

\(t=1,\ldots , T\), where \(\varvec{\epsilon }_t = \mathbf {D}^{-1}_t\left( \mathbf {Y}_t-\varvec{\mu }\right) \). The \(\{\mathbf {Q}_t\}\) form a sequence of conditional matrices parameterized by

$$\begin{aligned} \mathbf {Q}_t = \mathbf {S}\left( 1-a-b \right) + a \left( \varvec{\epsilon }_{t-1} \varvec{\epsilon }_{t-1}' \right) +b \mathbf {Q}_{t-1}, \end{aligned}$$
(54)

with \(\mathbf {S}\) the \(d\times d\) unconditional correlation matrix (Engle 2002, p. 341) of the \(\varvec{\epsilon }_{t}\), and parameters a and b are estimated via maximum likelihood conditional on estimates of all other parameters, as discussed next. Matrices \(\mathbf {S}\) and \(\mathbf {Q}_{0}\) can be estimated with the usual plug-in sample correlation based on the filtered \(\varvec{\epsilon }_{t}\); see also Bali and Engle (2010) and Engle and Kelly (2012) on estimation of the DCC model. Observe that the resulting \(\mathbf {Q}_t\) from the update in (54) will not necessarily be precisely a correlation matrix; this is the reason for the standardization in (53). See Caporin and McAleer (2013) for several critiques of this DCC construction; and Aielli (2013) for a modified DCC model, termed cDCC, with potentially better small-sample properties. The CCC model is a special case of (52), with \(a=b=0\) in (54).

The mean vector, \(\varvec{\mu }\), can be set to zero, or estimated using the sample mean of the returns, as in Engle and Sheppard (2001) and McAleer et al. (2008), though in a more general non-Gaussian context, is best estimated jointly with the other parameters associated with each univariate return series; see Paolella and Polak (2017). Let \(\mathbf {Y} = [\mathbf {Y}_1, \ldots , \mathbf {Y}_T]'\), and denote the set of parameters as \(\varvec{\theta }\). The log-likelihood of the remaining parameters, conditional on \(\varvec{\mu }\), is given by

$$\begin{aligned} \ell (\varvec{\theta }; \mathbf {Y}, \varvec{\mu })&=-\frac{1}{2} \sum _{t} \left( d\ln (2\pi ) + \ln (\vert \mathbf {H}_t \vert ) + \left( \mathbf {Y}_t-\varvec{\mu }\right) ' \mathbf {H}^{-1}_t\left( \mathbf {Y}_t-\varvec{\mu }\right) \right) \\&=-\frac{1}{2} \sum _{t} \left( d\ln (2\pi ) + 2 \ln (\vert \mathbf {D}_t\vert ) + \ln (\vert \mathbf {R}_t\vert ) + \varvec{\epsilon }_{t}' \mathbf {R}^{-1}_t \varvec{\epsilon }_{t}\right) . \end{aligned}$$

Then, as in Engle (2002), adding and subtracting \(\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t}\), \(\ell \) can be decomposed as the sum of volatility and correlation terms, \(\ell = \ell _V + \ell _C\), where

$$\begin{aligned} \ell _V = -\frac{1}{2} \sum _{t} \big ( d\ln (2\pi ) + 2 \ln (\vert \mathbf {D}_t\vert ) +\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t} \big ), \quad \ell _C = -\frac{1}{2} \sum _{t} \big (\ln (\vert \mathbf {R}_t\vert ) +\, \varvec{\epsilon }_{t}' \mathbf {R}^{-1}_t \varvec{\epsilon }_{t}-\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t} \big ), \end{aligned}$$

so that a two-step maximum likelihood estimation procedure can be applied: First, estimate the GARCH model parameters for each univariate returns series and construct the standardized residuals; second, maximize the conditional likelihood with respect to parameters a and b in (54) based on the filtered residuals from the previous step. We now discuss this first step in more detail.

While Francq and Zakoïan (2004) prove the consistency and asymptotic normality of the GARCH model parameters, interest centers on their numeric estimation. Dropping the subscript i, the choice of starting values for \(\hat{c}_0\), \(\hat{c}_1\), and \(\hat{d}_1\) are important, as the log-likelihood can exhibit more than one local maxima. This issue of multiple maxima has been noted by Ma et al. (2006), Winker and Maringer (2009), and Paolella and Polak (2015b), though seems to be often ignored, and can lead to inferior forecasts and jeopardize results in applied work. This unfortunate observation might help explain the results of Brooks et al. (2001, p. 54) in their extensive comparison of econometric software. In particular, they find that, with respect to estimating just the simple normal GARCH model, “the results produced using a default application of several of the most popular econometrics packages differ considerably from one another”. Another reason for discrepant results is the choice of \(\epsilon _0\) and \(\sigma _0\) to start the GARCH(1,1) recursion, for which several suggestions exist in the literature. We take \(\hat{\sigma }^2_0\) to be the sample unconditional variance of the \(R_t\), and \(\hat{\epsilon }^2_0 = \kappa \hat{\sigma }^2_0\), where

$$\begin{aligned} \kappa := {\mathbb E}\big [ \left( \left| Z \right| - g Z \right) ^{\delta } \big ] \end{aligned}$$
(55)

depends on the density specification \(f_Z\left( \cdot \right) \) and is stated for the more general APARCH model (5). For \(Z \sim \text {N}(0,1)\), a trivial calculation yields

$$\begin{aligned} {\mathbb E}\big [ \left( \left| Z \right| - g Z \right) ^{\delta } \big ] = \frac{1}{\sqrt{2\pi }} \left[ \left( 1+ g \right) ^{\delta } + \left( 1 - g \right) ^{\delta } \right] 2^{(\delta -1)/2} \Gamma \left( \frac{\delta +1}{2}\right) . \end{aligned}$$

In our case, with \(\delta =2\) and \(g =0\), this reduces to \(\kappa = {\mathbb E}\big [ \left| Z \right| ^2 \big ] = 1\).

Paolella and Polak (2015b) demonstrate the phenomenon of multiple maxima with a real (and typical) data set, and propose a solution that is simple to implement, making use of the profile log-likelihood (p.l.) obtained by fixing the value of \(c_0\), and using a grid of points of \(c_0\) between zero and 1.1 times the sample variance of the series. That is, for a fixed value of \(c_0\), we compute

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm{p.l.}}(c_0) = \arg \max _{\varvec{\theta }_{\mathrm{p.l.}}} \ell (\varvec{\theta }_{\mathrm{p.l.}}; \mathbf {R}), \qquad \varvec{\theta }_{\mathrm{p.l.}} = (c_1, d_1)'. \end{aligned}$$
(56)

To obtain (with high probability) the global maximum, the following procedure suggests itself: (i) Based on a set of \(c_0\) values, compute (56); (ii) take the value of \(c_0\) from the set, say \(c_0^{*}\), and its corresponding \(\widehat{\varvec{\theta }}_{\mathrm{p.l.}}(c_0^{*})\) that results in the largest log-likelihood as starting values, to (iii) estimate the full model. The finer the grid, the higher the probability of reaching the global maximum; some trials suggest that a grid of length 10 is adequate. The use of more parameters, as arise with more elaborate GARCH structures such as the APARCH formulation, or additional shape parameter(s) of a non-Gaussian distribution such as the NCT or stable Paretian, can further exacerbate the problem of multiple local maxima of the likelihood.

1.2 B.2 Remarks on DCC

One might argue that only two parameters for modeling the evolution of an entire correlation matrix will not be adequate. While this is certainly true, the models of Engle (2002) and Tse and Tsui (2002) have two strong points: First, their use is perhaps better than no parameters (as in the CCC model), and second, it allows for easy implementation and estimation. Generalizations of the simple DCC structure that allow the number of parameters to be a function of d, and also introducing asymmetric extensions of the DCC idea, are considered in Engle (2002) and Cappiello et al. (2006), though with a potentially very large number of parameters, the usual estimation and inferential problems arise.

Bauwens and Rombouts (2007) consider an approach in which similar series are pooled into one of a small number of clusters, such that their GARCH parameters are the same within a cluster. A related idea is to group series with respect to their correlations, generalizing the DCC model; see, e.g., Vargas (2006), Billio et al. (2006), Zhou and Chan (2008), Billio and Caporin (2009), Engle and Kelly (2012), So and Yip (2012), Aielli and Caporin (2013), and the references therein.

An alternative approach is to assume a Markov switching structure between two (or more) regimes, each of which has a CCC structure, as first proposed in Pelletier (2006), and augmented to the non-Gaussian case in Paolella et al. (2017). Such a construction implies many additional parameters, but their estimation makes use of the usual sample correlation estimator, thus avoiding the curse of dimensionality, and shrinkage estimation can be straightforwardly invoked to improve performance. The idea that, for a given time segment, the correlations are constant, and take on one set (of usually two, or at most three sets) of values. This appears to be better than attempting to construct a model that allows for their variation at every point in time. The latter might be “asking too much of the data” and inundated with too many parameters. Paolella et al. (2017) demonstrate strong out-of-sample performance of their non-Gaussian Markov switching CCC model with two regimes, compared to the Gaussian CCC case, the Gaussian CCC switching case, the Gaussian DCC model, and the non-Gaussian single component CCC of Paolella and Polak (2015b).

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paolella, M.S., Polak, P. (2018). COBra: Copula-Based Portfolio Optimization. In: Kreinovich, V., Sriboonchitta, S., Chakpitak, N. (eds) Predictive Econometrics and Big Data. TES 2018. Studies in Computational Intelligence, vol 753. Springer, Cham. https://doi.org/10.1007/978-3-319-70942-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70942-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70941-3

  • Online ISBN: 978-3-319-70942-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics