COBra: Copula-Based Portfolio Optimization

Paolella, Marc S.; Polak, Paweł

doi:10.1007/978-3-319-70942-0_3

COBra: Copula-Based Portfolio Optimization

Marc S. Paolella^5,6 &
Paweł Polak⁷

Conference paper
First Online: 02 December 2017

2271 Accesses
4 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 753))

Abstract

The meta-elliptical t copula with noncentral t GARCH univariate margins is studied as a model for asset allocation. A method of parameter estimation is deployed that is nearly instantaneous for large dimensions. The expected shortfall of the portfolio distribution is obtained by combining simulation with a parametric approximation for speed enhancement. A simulation-based method for mean-expected shortfall portfolio optimization is developed. An extensive out-of-sample backtest exercise is conducted and comparisons made with common asset allocation techniques.

M.S. Paolella—Financial support by the Swiss National Science Foundation (SNSF) through project #150277 is gratefully acknowledged.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aas, K.: Pair-copula constructions for financial applications: a review. Econometrics 4(4), 1–15 (2016). Article 43
Article MathSciNet Google Scholar
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-Copula Constructions of Multiple Dependence. Insur. Math. Econ. 44, 182–198 (2009)
Article MathSciNet MATH Google Scholar
Abdous, B., Genest, C., Rémillard, B.: Dependence Properties of Meta-Elliptical Distributions. In: Duchesne, P., Rémillard, B. (eds.) Statistical Modeling and Analysis for Complex Data Problems. Springer Verlag, New York (2005). Chapter 1
Google Scholar
Adcock, C.J.: Asset pricing and portfolio selection based on the multivariate extended skew-student-$t$ distribution. Ann. Oper. Res. 176(1), 221–234 (2010)
Article MathSciNet MATH Google Scholar
Adcock, C.J.: Mean-variance-skewness efficient surfaces, Stein’s lemma and the multivariate extended skew-student distribution. Eur. J. Oper. Res. 234(2), 392–401 (2014)
Article MathSciNet MATH Google Scholar
Adcock, C.J., Eling, M., Loperfido, N.: Skewed distributions in finance and actuarial science: a preview. Eur. J. Financ. 21(13–14), 1253–1281 (2015)
Article Google Scholar
Aielli, G.P.: Dynamic conditional correlation: on properties and estimation. J. Bus. Econ. Stat. 31(3), 282–299 (2013)
Article MathSciNet Google Scholar
Aielli, G.P., Caporin, M.: Fast clustering of GARCH processes via gaussian mixture models. Math. Comput. Simul. 94, 205–222 (2013)
Article MathSciNet Google Scholar
Asai, M.: Heterogeneous asymmetric dynamic conditional correlation model with stock return and range. J. Forecast. 32(5), 469–480 (2013)
Article MathSciNet Google Scholar
Ausin, M.C., Lopes, H.F.: Time-varying joint distribution through copulas. Comput. Stat. Data Anal. 54, 2383–2399 (2010)
Article MathSciNet MATH Google Scholar
Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J.: Pseudo-mathematics and financial charlatanism: the effects of backtest overfitting on out-of-sample performance. Not. Am. Math. Soc. 61(5), 458–471 (2014)
Article MathSciNet MATH Google Scholar
Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J.: The probability of backtest overfitting. J. Comput. Finan. (2016). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2840838
Bali, T.G., Engle, R.F.: The intertemporal capital asset pricing model with dynamic conditional correlations. J. Monetary Econ. 57(4), 377–390 (2010)
Article Google Scholar
Fundamental Review of the Trading Book: A Revised Market Risk Framework. Consultative document, Bank for International Settlements, Basel (2013)
Google Scholar
Bauwens, L., Rombouts, J.V.K.: Bayesian clustering of many GARCH models. Econometric Rev. 26(2), 365–386 (2007)
Article MathSciNet MATH Google Scholar
Billio, M., Caporin, M.: A generalized dynamic conditional correlation model for portfolio risk evaluation. Math. Comput. Simul. 79(8), 2566–2578 (2009)
Article MathSciNet MATH Google Scholar
Billio, M., Caporin, M., Gobbo, M.: Flexible dynamic conditional correlation multivariate GARCH models for asset allocation. Appl. Financ. Econ. Lett. 2(2), 123–130 (2006)
Article Google Scholar
Bloomfield, T., Leftwich, R., Long, J.: Portfolio strategies and performance. J. Financ. Econ. 5, 201–218 (1977)
Article Google Scholar
Bollerslev, T.: A conditional heteroskedastic time series model for speculative prices and rates of return. Rev. Econ. Stat. 69, 542–547 (1987)
Article Google Scholar
Bollerslev, T.: Modeling the coherence in short-run nominal exchange rates: a multivariate Generalized ARCH approach. Rev. Econ. Stat. 72, 498–505 (1990)
Article Google Scholar
Broda, S.A., Haas, M., Krause, J., Paolella, M.S., Steude, S.C.: Stable mixture GARCH models. J. Econometrics 172(2), 292–306 (2013)
Article MathSciNet MATH Google Scholar
Broda, S. A., Paolella, M. S:. Expected Shortfall for Distributions in Finance. In: Čížek, P., Härdle, W., and Rafał W. (eds.) Statistical Tools for Finance and Insurance (2011)
Google Scholar
Brooks, C., Burke, S.P., Persand, G.: Benchmarks and the accuracy of GARCH model estimation. Int. J. Forecast. 17(1), 45–56 (2001)
Article Google Scholar
Brown, S. J., Hwang, I., In, F.: Why Optimal Diversification Cannot Outperform Naive Diversification: Evidence from Tail Risk Exposure (2013)
Google Scholar
Bücher, A., Jäschke, S., Wied, D.: Nonparametric tests for constant tail dependence with an application to energy and finance. J. Econometrics 1(187), 154–168 (2015)
Article MathSciNet MATH Google Scholar
Cambanis, S., Huang, S., Simons, G.: On the theory of elliptically contoured distributions. J. Multivar. Anal. 11(3), 368–385 (1981)
Article MathSciNet MATH Google Scholar
Caporin, M., McAleer, M.: Ten things you should know about the dynamic conditional correlation representation. Econometrics 1(1), 115–126 (2013)
Article Google Scholar
Cappiello, L., Engle, R.F., Sheppard, K.: Asymmetric dynamics in the correlations of global equity and bond returns. J. Financ. Econometrics 4(4), 537–572 (2006)
Article Google Scholar
Chicheportiche, R., Bouchaud, J.-P.: The joint distribution of stock returns is not elliptical. Int. J. Theor. Appl. Financ. 15(3), 1250019 (2012)
Article MathSciNet MATH Google Scholar
Christoffersen, P., Errunza, V., Jacobs, K., Langlois, H.: Is the potential for international diversification disappearing? a dynamic copula approach. Rev. Financ. Stud. 25, 3711–3751 (2012)
Article Google Scholar
Clare, A., O’Sullivan, N., and Sherman, M.: Benchmarking UK mutual fund performance: the random portfolio experiment. Int. J. Financ. (2015). https://www.ucc.ie/en/media/research/centreforinvestmentresearch/RandomPortfolios.pdf
Demarta, S., McNeil, A.J.: The $t$ copula and related copulas. Int. Stat. Rev. 73(1), 111–129 (2005)
Article MATH Google Scholar
DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is the $1/N$ portfolio strategy? Rev. Financ. Stud. 22(5), 1915–1953 (2009)
Article Google Scholar
DeMiguel, V., Martin-Utrera, A., Nogales, F.J.: Size matters: optimal calibration of shrinkage estimators for portfolio selection. J. Bank. Financ. 37(8), 3018–3034 (2013)
Article Google Scholar
Devroye, L.: Non-Uniform Random Variate Generation. Springer Verlag, New York (1986)
Book MATH Google Scholar
Ding, P.: On the conditional distribution of the multivariate $t$ distribution. Am. Stat. 70(3), 293–295 (2016)
Article MathSciNet Google Scholar
Ding, Z., Granger, C.W.J., Engle, R.F.: A long memory property of stock market returns and a new model. J. Empir. Financ. 1(1), 83–106 (1993)
Article Google Scholar
Edwards, T., Lazzara, C.J.: Equal-Weight Benchmarking: Raising the Monkey Bars. Technical report, McGraw Hill Financial (2014)
Google Scholar
Embrechts, P.: Copulas: a personal view. J. Risk Insur. 76, 639–650 (2009)
Article Google Scholar
Embrechts, P., McNeil, A., Straumann, D.: Correlation and dependency in risk management: properties and pitfalls. In: Dempster, M.A.H. (ed.) Risk Management: Value at Risk and Beyond, pp. 176–223. Cambridge University Press, Cambridge (2002)
Chapter Google Scholar
Engle, R.: Anticipating Correlations: A New Paradigm for Risk Management. Princeton University Press, Princeton (2009)
Book Google Scholar
Engle, R., Kelly, B.: Dynamic equicorrelation. J. Bus. Econ. Stat. 30(2), 212–228 (2012)
Article MathSciNet Google Scholar
Engle, R.F.: Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J. Bus. Econ. Stat. 20, 339–350 (2002)
Article MathSciNet Google Scholar
Engle, R.F., Sheppard, K.: Theoretical and Empirical Properties of Dynamic Conditional Correlation Multivariate GARCH. NBER Working Papers 8554, National Bureau of Economic Research Inc (2001)
Google Scholar
Fang, H.B., Fang, K.T., Kotz, S.: The meta-elliptical distribution with given marginals. J. Multivar. Anal. 82, 1–16 (2002)
Article MathSciNet MATH Google Scholar
Fang, K.-T., Kotz, S., Ng, K.-W.: Symmetric Multivariate and Related Distributions. Chapman & Hall, London (1989)
MATH Google Scholar
Fink, H., Klimova, Y., Czado, C., Stöber, J.: Regime switching vine copula models for global equity and volatility indices. Econometrics 5(1), 1–38 (2017). Article 3
Article Google Scholar
Francq, C., Zakoïan, J.-M.: Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli 10(4), 605–637 (2004)
Article MathSciNet MATH Google Scholar
Francq, C., Zakoïan, J.-M.: GARCH Models: Structure Statistical Inference and Financial Applications. John Wiley & Sons Ltd., Chichester (2010)
Book MATH Google Scholar
Gambacciani, M., Paolella, M.S.: Robust normal mixtures for financial portfolio allocation. Forthcoming. In: Econometrics and Statistics (2017)
Google Scholar
Haas, M., Krause, J., Paolella, M.S., Steude, S.C.: Time-varying mixture GARCH models and asymmetric volatility. North Am. J. Econ. Financ. 26, 602–623 (2013)
Article Google Scholar
Haas, M., Mittnik, S., Paolella, M.S.: Mixed normal conditional heteroskedasticity. J. Financ. Econometrics 2(2), 211–250 (2004)
Article Google Scholar
He, C., Teräsvirta, T.: Properties of moments of a family of GARCH processes. J. Econometrics 92(1), 173–192 (1999a)
Article MathSciNet MATH Google Scholar
He, C., Teräsvirta, T.: Statistical properties of the asymmetric power ARCH model. In: Engle, R.F., White, H. (eds) Cointegration, Causality, and Forecasting. Festschrift in Honour of Clive W. J. Granger, pp. 462–474. Oxford University Press (1999b)
Google Scholar
Heyde, C.C., Kou, S.G.: On the controversy over tailweight of distributions. Oper. Res. Lett. 32, 399–408 (2004)
Article MathSciNet MATH Google Scholar
Hough, J.: Monkeys are better stockpickers than you’d think. Barron’s magazine (2014)
Google Scholar
Hurst, S.: The characteristic function of the student $t$ distribution. Financial Mathematics Research Report FMRR006-95, Australian National University, Canberra (1995). http://wwwmaths.anu.edu.au/research.reports/srr/95/044/
Jagannathan, R., Ma, T.: Risk reduction in large portfolios: why imposing the wrong constraints helps. J. Financ. 58(4), 1651–1683 (2003)
Article Google Scholar
Jondeau, E.: Asymmetry in tail dependence of equity portfolios. Computat. Stat. Data Anal. 100, 351–368 (2016)
Article MathSciNet Google Scholar
Jondeau, E., Rockinger, M.: Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. J. Econ. Dyn. Control 27, 1699–1737 (2003)
Article MathSciNet MATH Google Scholar
Jondeau, E., Rockinger, M.: The Copula-GARCH model of conditional dependencies: an international stock market application. J. Int. Money Financ. 25, 827–853 (2006)
Article Google Scholar
Jondeau, E., Rockinger, M.: On the importance of time variability in higher moments for asset allocation. J. Financ. Econometrics 10(1), 84–123 (2012)
Article Google Scholar
Karanasos, M., Kim, J.: A re-examination of the asymmetric power ARCH model. J. Empir. Financ. 13, 113–128 (2006)
Article Google Scholar
Kelker, D.: Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhyā, Series A 32(4), 419–430 (1970)
MathSciNet MATH Google Scholar
Kiefer, J., Wolfowitz, J.: Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27(4), 887–906 (1956)
Article MathSciNet MATH Google Scholar
Kogon, S.M., Williams, D.B.: Characteristic function based estimation of stable parameters. In: Adler, R.J., Feldman, R.E., Taqqu, M.S. (eds) A Practical Guide to Heavy Tails, pp. 311–335. Birkhauser Boston Inc. (1998)
Google Scholar
Krause, J., Paolella, M.S.: A fast, accurate method for value at risk and expected shortfall. Econometrics 2, 98–122 (2014)
Article Google Scholar
Kuester, K., Mittnik, S., Paolella, M.S.: Value-at-risk prediction: a comparison of alternative strategies. J. Financ. Econometrics 4, 53–89 (2006)
Article Google Scholar
Ling, S., McAleer, M.: Necessary and sufficient moment conditions for the garch($r, s$) and asymmetric power garch($r, s$) models. Econometric Theor. 18(3), 722–729 (2002)
Article MathSciNet MATH Google Scholar
Ma, J., Nelson, C.R., Startz, R.: Spurious inference in the GARCH(1,1) model when it is weakly identified. Stud. Nonlinear Dyn. Econometrics 11(1), 1–27 (2006). Article 1
MATH Google Scholar
Markowitz, H.: Portfolio Selection. J. Financ. 7(1), 77–91 (1952)
Google Scholar
McAleer, M., Chan, F., Hoti, S., Lieberman, O.: Generalized autoregressive conditional correlation. Econometric Theor. 24(6), 1554–1583 (2008)
Article MathSciNet MATH Google Scholar
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton (2005)
MATH Google Scholar
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton (2015). Revised edition
Google Scholar
Mittnik, S., Paolella, M.S.: Prediction of financial downside risk with heavy tailed conditional distributions. In: Rachev, S.T. (ed.) Handbook of Heavy Tailed Distributions in Finance. Elsevier Science, Amsterdam (2003)
Google Scholar
Mittnik, S., Paolella, M.S., Rachev, S.T.: Stationarity of stable power-GARCH processes. J. Econometrics 106, 97–107 (2002)
Article MathSciNet MATH Google Scholar
Nguyen, H.T.: On evidential measures of support for reasoning with integrate uncertainty: a lesson from the ban of P-values in statistical inference. In: Huynh, V.-N., Inuiguchi, M., Le, B., Le, B.N., Denoeux, T. (eds.) 5th International Symposium on Integrated Uncertainty in Knowledge Modeling and Decision Making IUKM 2016, pp. 3–15. Springer, Cham (2016)
Google Scholar
Nolan, J. P.: Stable Distributions - Models for Heavy Tailed Data. Birkhäuser, Boston (2015, forthcoming). Chapter 1 online
Google Scholar
Paolella, M.S.: Intermediate Probability: A Computational Approach. John Wiley & Sons, Chichester, West Sussex, England (2007)
Book MATH Google Scholar
Paolella, M.S.: Multivariate asset return prediction with mixture models. Eur. J. Financ. 21, 1–39 (2013)
Google Scholar
Paolella, M.S.: Fast methods for large-scale non-elliptical portfolio optimization. Ann. Financ. Econ. 09(02), 1440001 (2014)
Article Google Scholar
Paolella, M.S.: Stable-GARCH models for financial returns: fast estimation and tests for stability. Econometrics 4(2), 25 (2016). Article 25
Article MathSciNet Google Scholar
Paolella, M.S.: The univariate collapsing method for portfolio optimization. Econometrics 5(2), 1–33 (2017). Article 18
Article MathSciNet Google Scholar
Paolella, M.S., Polak, P.: ALRIGHT: Asymmetric LaRge-Scale (I)GARCH with hetero-tails. Int. Rev. Econ. Financ. 40, 282–297 (2015a)
Google Scholar
Paolella, M.S., Polak, P.: COMFORT: A common market factor non-gaussian returns model. J. Econometrics 187(2), 593–605 (2015b)
Google Scholar
Paolella, M.S., Polak, P.: Portfolio Selection with Active Risk Monitoring. Research paper, Swiss Finance Institute (2015c)
Google Scholar
Paolella, M.S., Polak, P.: Density and Risk Prediction with Non-Gaussian COMFORT Models (2017). Submitted
Google Scholar
Paolella, M.S., Polak, P., Walker, P.: A Flexible Regime-Switching Model for Asset Returns (2017). Submitted
Google Scholar
Patton, A.J.: A review of copula models for economic time series. J. Multivar. Anal. 110, 4–18 (2012)
Article MathSciNet MATH Google Scholar
Pelletier, D.: Regime switching for dynamic correlations. J. Econometrics 131, 445–473 (2006)
Article MathSciNet MATH Google Scholar
Righi, M.B., Ceretta, P.S.: Individual and flexible expected shortfall backtesting. J. Risk Model Valid. 7(3), 3–20 (2013)
Article Google Scholar
Righi, M.B., Ceretta, P.S.: A comparison of expected shortfall estimation models. J. Econ. Bus. 78, 14–47 (2015)
Article Google Scholar
Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, London (1994)
MATH Google Scholar
Scherer, M.: CDO pricing with nested archimedean copulas. Quant. Financ. 11, 775–787 (2011)
Article MathSciNet MATH Google Scholar
Shaw, W.T.: Monte Carlo Portfolio Optimization for General Investor Risk-Return Objectives and Arbitrary Return Distributions: a Solution for Long-only Portfolios (2010)
Google Scholar
So, M.K.P., Yip, I.W.H.: Multivariate GARCH models with correlation clustering. J. Forecast. 31(5), 443–468 (2012)
Article MathSciNet Google Scholar
Song, D.-K., Park, H.-J., Kim, H.-M.: A note on the characteristic function of multivariate $t$ distribution. Commun. Stat. Appl. Methods 21(1), 81–91 (2014)
MATH Google Scholar
Stoyanov, S., Samorodnitsky, G., Rachev, S., Ortobelli, S.: Computing the portfolio conditional value-at-risk in the alpha-stable case. Probab. Math. Statistics 26, 1–22 (2006)
MathSciNet MATH Google Scholar
Sutradhar, B.C.: On the characteristic function of multivariate student $t$-distribution. Can. J. Stat. 14(4), 329–337 (1986)
Article MathSciNet MATH Google Scholar
Tse, Y.K., Tsui, A.K.C.: A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. J. Bus. Econ. Stat. 20(3), 351–362 (2002)
Article MathSciNet Google Scholar
Vargas, G.A.: An asymmetric block dynamic conditional correlation multivariate GARCH model. Philippine Stat. 55(1–2), 83–102 (2006)
Google Scholar
Winker, P., Maringer, D.: The convergence of estimators based on heuristics: theory and application to a GARCH model. Comput. Stat. 24(3), 533–550 (2009)
Article MathSciNet MATH Google Scholar
Wolf, O.L.M.: Honey, I shrunk the sample covariance matrix: problems in mean-variance optimization. J. Portfolio Management 30(4), 110–119 (2004)
Article Google Scholar
Zhou, T., Chan, L.: Clustered dynamic conditional correlation multivariate garch model. In: Song, I.-Y., Eder, J., Nguyen, T. M. (eds) Proceedings of the 10th International Conference Data Warehousing and Knowledge Discovery, DaWaK 2008, Turin, Italy, 2–5 September 2008, pp. 206–216 (2008)
Google Scholar
Zolotarev, V.M.: One Dimensional Stable Distributions (Translations of Mathematical Monograph, Vol. 65). American Mathematical Society, Providence, RI (1986). Translated from the original Russian verion (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Banking and Finance, University of Zurich, Zurich, Switzerland
Marc S. Paolella
Swiss Finance Institute, Geneva, Lausanne, Lugano, Zurich, Switzerland
Marc S. Paolella
Department of Statistics, Columbia University, New York, USA
Paweł Polak

Authors

Marc S. Paolella
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Polak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc S. Paolella .

Editor information

Editors and Affiliations

Computer Science Department, University of Texas at El Paso, El Paso, Texas, USA
Vladik Kreinovich
International College, Chiang Mai University, Chiang Mai, Thailand
Songsak Sriboonchitta
International College, Chiang Mai University, Chiang Mai, Thailand
Nopasit Chakpitak

Appendices

A Parametric Forms for Approximating the Distribution of $\widetilde{\mathbf {R}}_{P}$

We detail here the four candidate parametric structures mentioned in Sect. 2.6.

1.1 A.1 The Noncentral Student’s t

The first is the location-scale $\mathrm {NCT}^{*}$ distribution (3). As location $\mu $ and scale $\sigma $ parameters need to be estimated along with the $\mathrm {NCT}^{*}$ shape parameters, we compute

$$\begin{aligned} \arg \max _{\mu , \sigma } f_{\mathrm{NCT}} \Big ({{P}}_{t+1 \mid t, \mathbf {w}} ; \widetilde{\nu }, \widetilde{\gamma }, \mu , \sigma \Big ), \quad \widetilde{\nu }, \widetilde{\gamma } = \mathrm{KP} \Big ( {{Z}}_{t+1 \mid t, \mathbf {w}} \Big ), \quad {{Z}}_{t+1 \mid t, \mathbf {w}} = \frac{{{P}}_{t+1 \mid t, \mathbf {w}} - \mu }{\sigma }. \end{aligned}$$

(42)

Starting values are taken to be the 50% trimmed mean for $\mu $ (i.e., the lower and upper 25% of the sorted sample are ignored) and, using (6) with $\nu =4$ and $\gamma =0$, gives $(s^2/2)^{1/2}$ for $\sigma $, where $s^2$ denotes the sample variance. Two box constraints $q_{0.25}< \widehat{\mu }<q_{0.75}$ and $(s^2/10)^{1/2}< \widehat{\sigma } < s$ are imposed during estimation, where $q_{\xi }$ denotes the $\xi $th sample quantile. The mean and variance are then determined from (6), while the ES is, via a table-lookup procedure, given essentially instantaneously from the KP method, noting that, for any probability $0<\xi <1$, $\mathrm{ES}({{P}}_{t+1 \mid t, \mathbf {w}}; \xi ) = \mu + \sigma \mathrm{ES}({{Z}}_{t+1 \mid t, \mathbf {w}}; \xi )$.

1.2 A.2 The Generalized Asymmetric t

The second candidate is the five-parameter generalized asymmetric t, or GAt distribution. The pdf is

$$\begin{aligned} f_{\mathrm{GA}t}(z;d,\nu ,\theta ) = K \times \left\{ \begin{array}{ll} \Bigg ( 1+\dfrac{(-z \theta )^d}{\nu }\Bigg )^{-(\nu + 1{/}d)}, &{} \text {if }z<0, \\ \Bigg ( 1+\dfrac{( z/\theta )^d}{\nu }\Bigg )^{-(\nu + 1{/}d)}, &{} \text {if }z\ge 0, \end{array} \right. \end{aligned}$$

(43)

where $d,\nu ,\theta \in {\mathbb R}_{> 0}$, and $K^{-1}=(\theta ^{-1} + \theta ) d^{-1} \nu ^{1{/}d} B(1{/}d,\nu )$. It is noteworthy because limiting cases include the generalized exponential (GED), and hence the Laplace and normal, while the Student’s t (and, thus, the Cauchy) distributions are special cases. For $\theta >1$ ($\theta <1$) the distribution is skewed to the right (left), while for $\theta =1$, it is symmetric. See Paolella (2007, p. 273) for further details. The rth moment for integer r such that $0 \le r < \nu d$ is

$$\begin{aligned} {\mathbb E}\big [Z^r\big ] = \frac{I_1+I_2}{K^{-1}} = \frac{(-1)^r \theta ^{-(r+1)} + \theta ^{r+1}}{\theta ^{-1}+\theta }\frac{ \,B\big ((r+1)/d,\nu -r/d\big ) }{B\big (1{/}d,\nu \big )} \nu ^{r/d}, \end{aligned}$$

i.e., the mean is

$$\begin{aligned} {\mathbb E}\big [Z\big ] = \frac{\theta ^2 -\theta ^{-2}}{\theta ^{-1}+\theta }\frac{\,B\big (2/d,\nu -1{/}d\big )}{B\big (1{/}d,\nu \big )} \nu ^{1{/}d} \end{aligned}$$

(44)

when $\nu d >1$, and the variance is computed in the obvious way. The cumulative distribution function (cdf) of $Z \sim \mathrm{GA}t(d,\nu ,\theta )$ is

(45)

where is the incomplete beta ratio,

$$\begin{aligned} L=\frac{\nu }{\nu +\big (-z\theta \big )^d}, \quad \text {and} \quad U=\frac{\big (z/\theta \big )^d}{\nu +\big (z/\theta \big )^d}. \end{aligned}$$

For computing the ES, we require ${\mathbb E}[Z^r\mid Z<c]$ for $r=1$. For $c<0$, this is given by

$$\begin{aligned} S_r(c)=(-1)^r\nu ^{r/d}\frac{\big (1+\theta ^2\big )}{\big (\theta ^r+\theta ^{r+2}\big )}\frac{B_L\big (\nu -r/d,(r+1)/d\big )}{B_L\big (\nu ,1{/}d\big ),},\quad L=\frac{\nu }{\nu (-c\theta )^d}. \end{aligned}$$

(46)

The existence of the mean and the ES requires $\nu d >1$.

1.3 A.3 The Two-Component Mixture GAt

With five parameters (including location and scale), the GAt is a rather flexible distribution. However, as our third choice, greater accuracy can be obtained by using a two-component mixture of GAt, with mixing parameters $0<\lambda _1<1$ and $\lambda _2=1-\lambda _1$. This 11 parameter construction is extraordinarily flexible, and should be quite adequate for modeling the portfolio distribution. We also assume that the true distribution is not (single component) GAt, and that the distributional class of two-component mixtures of GAt is identified. Its pdf and cdf are just weighted sums of GAt pdfs and cdfs respectively, so that evaluation of the cdf is no more involved than that of the GAt. Let P denote a K-component mixGAt distribution, where each component has the three aforementioned shape parameters, as well as location $u_i$ and scale $c_i$, $i=1,\dots , K$. First observe that the cdf of the mixture is given by

$$\begin{aligned} F_P(z) = \sum _{j=1}^K \lambda _j F_{\mathrm{Z}_j} \bigg (\frac{z-u_j}{c_j}; d_j, \nu _j, \theta _j \bigg ), \quad 0<\lambda _j<1, \quad \sum _{j=1}^K \lambda _j = 1, \end{aligned}$$

(47)

where the ith cdf mixture component is given as the closed-form expression in (45), so that a quantile can be found by simple one-dimensional root searching. Similar to calculations for the ES of mixture distributions in Broda and Paolella (2011), the ES of the mixture is given by

$$\begin{aligned} \mathrm{ES}_{\xi }(P)&= \frac{1}{\xi }\int _{-\infty }^{q_{P,\xi }} x f_{P}(x) \hbox {d}x = \frac{1}{\xi }\sum _{j=1}^K\lambda _j\int _{-\infty }^{q_{P,\xi }} x c_{j}^{-1} f_{Z_j}\bigg (\frac{x-u_j}{c_j}\bigg ) \hbox {d}x \nonumber \\&= \frac{1}{\xi } \sum _{j=1}^{K}\lambda _j\int _{-\infty }^{\frac{q_{P,\xi }-u_j}{c_j}} (c_jz+u_j)c_j^{-1} f_{Z_j}(z)c_j \hbox {d}z \nonumber \\&= \frac{1}{\xi } \sum _{j=1}^{K}\lambda _j\Bigg [c_j\int _{-\infty }^{\frac{q_{P,\xi }-u_j}{c_j}} z f_{Z_j}(z) \hbox {d}z+u_j\int _{-\infty }^{\frac{q_{P,\xi }-u_j}{c_j}} f_{Z_j}(z) \hbox {d}z \Bigg ]\nonumber \\&= \frac{1}{\xi } \sum _{j=1}^{K}\lambda _j\Bigg [c_j S_{1,Z_j}\bigg (\frac{q_{P,\xi }-u_j}{c_j}\bigg ) + u_j F_{Z_j}\bigg (\frac{q_{P,\xi }-u_j}{c_j}\bigg ) \Bigg ], \end{aligned}$$

(48)

where $q_{P,\xi }$ is the $\xi $-quantile of P, $S_{1,Z_j}$ is given in (46), and $F_{Z_j}$ is the cdf of the GAt random variable given in (45), both functions evaluated with the parameters $d_j$, $\nu _j$, and $\theta _j$ from the mixture components, $Z_j$, for $j=1,\dots ,K$.

While estimation of the two-component mixture GAt is straightforward using standard ML estimation, it was found that this occasionally resulted in an inferior, possibly bi-modal fit that optically did not agree well with a kernel-density estimate. This artefact arises from the nature of mixture distributions and the problems associated with the likelihood. We present a method that leads, with far higher probability, to a successful model fit, based on a so-called augmented likelihood procedure. The technique was first presented in Broda et al. (2013) and is adapted for the mixture GAt as follows.

Let $f(x; \varvec{\theta })=\sum _{i=1}^K \lambda _i f_i(x; \varvec{\theta }_i)$ be the univariate pdf of a K-component (finite) mixture distribution with component weights $\lambda _1, \ldots , \lambda _K$ positive and summing to one. The likelihood function is

$$\begin{aligned} \ell ^{\star }(\varvec{\theta }; \mathbf {x}) = \sum _{t=1}^T \log \sum _{i=1}^K \lambda _i f_i( x_t; \varvec{\theta }_i), \end{aligned}$$

(49)

where $\mathbf {x}=(x_1,\dots ,x_T)'$ is the sequence of evaluation points, and $\varvec{\theta } = (\varvec{\lambda }, \varvec{\theta }_1, \dots , \varvec{\theta }_K)'$ is the vector of all model parameters. Assuming that the $\varvec{\theta }_i$ include location and scale parameters, $\ell ^{\star }$ is plagued with “spikes”—it is an unbounded function with multiple maxima, see, e.g., Kiefer and Wolfowitz (1956). Hence, numerical maximization of (49) is prone (depending on factors like starting values and the employed numerical optimization method) to result in inaccurate, if not arbitrary, estimates. To avoid this problem, an augmented likelihood function is proposed in Broda et al. (2013). The idea is to remove unbounded states from the likelihood function by introducing a smoothing (shrinkage) term that, at maximum, drives all components to act as one (irrespective of their assigned mixing weight) such that the mixture loses its otherwise inherently large flexibility. The suggested augmented likelihood function is given by

$$\begin{aligned} \widetilde{\ell }(\varvec{\theta };\mathbf {x}) = \ell ^{\star }(\varvec{\theta };\mathbf {x}) + \kappa \sum _{i=1}^K \frac{1}{T} \sum _{t=1}^T \log f_i(x_t;\varvec{\theta }_i), \end{aligned}$$

(50)

where $\kappa $, $\kappa \ge 0$, controls the shrinkage strength. If all component densities $f_i$ are of the same type, larger values of $\kappa $ lead to more similar parameter estimates across components, with identical estimates in the limit, as $\kappa \rightarrow \infty $. At $\kappa =0$, (50) reduces to (49). The devised estimator,

$$\begin{aligned} \widehat{\varvec{\theta }}_{\text {ALE}} = \arg \max _{\varvec{\theta }} \widetilde{\ell }(\varvec{\theta };\mathbf {x}), \end{aligned}$$

is termed the augmented likelihood estimator (ALE) and is asymptotically consistent, as $T \rightarrow \infty $. By changing $\kappa $, smooth density estimates can be enforced, even for small sample sizes. For mixGAt with $K=2$ and 250 observations, we obtain $\kappa =10$ as an adequate choice, which, in our empirical testing, guaranteed unimodal estimates in all cases, while still offering enough flexibility for accurate density fits, significantly better than those obtained with the single component GAt.

1.4 A.4 The Asymmetric Stable Paretian

The fourth candidate we consider is the use of the asymmetric non-Gaussian stable Paretian distribution, hereafter stable, with location $\mu $, scale c, tail index $\alpha $, and asymmetry parameter $\beta $. We use the parametrization such that the mean, assuming $\alpha >1$, is given by $\mu $. (In Nolan 2015, and the use of his software, this corresponds to his first parametrization; see also Zolotarev 1986; and Samorodnitsky and Taqqu 1994.)

This might at first seem like an odd candidate, given the historical difficulties in its estimation and the potentially problematic calculation of the ES, given the extraordinary heavy-tailed nature of the distribution and the problems associated with the calculation of the density far into the tails; see, e.g., Paolella (2016) and the references therein. We circumvent both of these issues as follows. We make use of the estimator based on the sample characteristic function of Kogon and Williams (1998), which is fast to calculate, and results in estimates that are very close in performance to the MLE. We use the function provided in John Nolan’s STABLE toolbox, saving us the implementation, and easily confirming via simulation that his procedure is correct (and very fast). For the ES calculation, we first need the appropriate quantile, which is also implemented in Nolan’s toolbox. The ES integral can then be computed using the integral expression given in Stoyanov et al. (2006), which cleverly avoids integration into the tail.

This procedure, while feasible, is still too time consuming for our purposes. Instead, we use the same procedure employed in Krause and Paolella (2014) to generate a (massive) table in two dimensions ($\alpha $ and $\beta $) to deliver the VaR (the required quantile) and the ES, essentially instantaneously and with very high accuracy. There is one caveat with its use that requires remedying. It is well-known, and as simulations quickly verify, that estimation of the asymmetry parameter $\beta $ is subject to the most variation, for any particular sample size. The nature of the stable distribution, with its extremely heavy tails, relative to asymmetric Student’s t distributions, will induce observations in small samples that have a relatively large impact on the estimation of $\beta $. This is particularly acute when using a relatively small sample size of $T=250$. As such, we recommend use of a simple shrinkage estimator, with target zero and weight $s_{\beta }$, namely delivering $\widehat{\beta } = s_{\beta } \widehat{\beta }_{\mathrm{MLE}}$. Some trial and error suggests $s_{\beta }=0.3$ to be a reasonable choice for $T=250$.

The motivation for using the stable is the conservative nature of the delivered ES. In particular, the first three methods we discussed are all based on asymmetric variations of the Student’s t distribution which, while clearly heavy-tailed (it does not possess a moment generating function on an open neighborhood around zero), still potentially possesses a variance; as opposed to the stable, except in the measure-zero case of $\alpha =2$. As such, and because estimation is based on a finite amount of data, the ES delivered from the stable will be expected to be larger than those from the t-based models. This might be desirable when more conservative estimates of risk should be used, and will also be expected to affect the optimized portfolio vectors and the performance of the method.

1.5 A.5 Discussion of Portfolio Tail Behavior and ES

It is worth mentioning that the actual tail behavior of financial assets is not necessarily heavy-tailed; the discussion in Heyde and Kou (2004) should settle this point. This explains why, on the one hand, exponential-tailed distributions, such as the mixed normal, can deliver excellent VaR predictions; see, e.g., Haas et al. (2004), Haas et al. (2013), and Paolella (2013); while, on the other hand, stable-Paretian GARCH models also work admirably well; see e.g., Mittnik et al. (2002) and Mittnik and Paolella (2003).

Further, observe that the tail behavior associated with the ${{P}}_{t+1 \mid t, \mathbf {w}}$, given the model and the parameters, is not subject to debate: by the nature of the model we employ, it involves convolutions of (dependent) random variables with power tails, and, as such, will also have power tails, and will (presumably) be in the domain of attraction of a stable law. It is, however, analytically intractable. Observe that it is fallacious to argue that, as our model involves use of the (noncentral) Student’s t, with estimated degrees of freedom parameters (after application of the APARCH filter) above two, the convolution will have a finite variance, and so the stable distribution cannot be considered. It is crucial to realize first that the model we employ is wrong w.p.1 (and also subject to estimation error) and, second, recalling that, if an i.i.d. set of stable data with, say, $\alpha =1.7$ is estimated as a location-scale Student’s t model, the resulting estimated degrees of freedom will not be below two, but rather closer to four.

As such, we believe it makes sense to consider several methods of determining the ES, and compare them in terms of portfolio performance.

1.6 A.6 Comparison of Methods

The computation times for estimating the model and evaluating mean and ES for each of the four methods discussed above were compared. Based on a sample size of $s_1=1e3$, the NCT method requires, on average, 0.20 seconds. The GAt and mixGAt require 0.23 and 1.96 seconds, respectively, while the stable requires 0.00064 seconds. Generation of $s_1=1e6$ (1e3) samples requires approximately 2769.34 (2.91) seconds, and the empirical calculation of the mean and ES based on $s_1=1e6$ requires approximately 0.35 seconds. The bottleneck in the generation of samples is the evaluation of the NCT quantile function in (13). In summary, it is fastest to use $s_1=1e3$ samples and one of the parametric methods to obtain the mean and ES.

We now wish to compare the ES values delivered by each of the methods. For this, we fix the portfolio vector $\mathbf {w}$ to be equally weighted, and use 100 moving windows of data, each of length 250, and compute, for each method, the ES corresponding to the one-day-ahead predictive distribution and the fixed equally weighted portfolio. All the ES values (the empirically determined ones as well as the parametric ones) are based on (the same) 1e5 replications. The 100 windows have starting dates 8 August 2012 to 31 December 2012 and use the $d=30$ constituents (as of April 2013) of the Dow Jones Industrial Average index from Wharton/CRSP. The values are shown in Fig. 8, and have been smoothed to enhance visibility. As expected, the stable ES values are larger than those delivered from the t-based models and also the empirically determined ES values. The mixGAt is the most flexible distribution and approximates the empirical ES nearly exactly, though takes the longest time to compute of the four parametric methods.

1.7 A.7 Calibrating the Number of Samples $\mathbf {s_1}$

As stated in Sect. 2.6, we wish to determine a heuristic for selecting the number of samples, $s_1$, from the predictive copula distribution, in order to obtain the ES. This is conducted as follows. The copula model is estimated for all non-overlapping windows of length $T=250$ based on the 30 components of the DJIA returns available from 4 Jan. 1993 to 31 Dec. 2012 and the ES of the predictive returns distribution for the equally weighted portfolio is computed. The goal is to determine an approximation to the smallest value of $s_1$, say $s_1^*$, such that the sampling variance of the ES determined from the parametric methods is less than some threshold. This value $s_1^*$ is then linked to the tail thickness of the various predictive returns distributions over the non-overlapping windows.

To compute $s_1^*$ for a particular data set, the ES is calculated $n=50$ times for a fixed $s_1$, based on simulation of the predictive returns distribution, and having used the NCT and stable parametric forms for its approximation. This is conducted for a range of $s_1$ values, and $s_1^*$ is taken to be the smallest number such that the sample variance is less than a threshold value, For the NCT and stable estimators, Fig. 9 shows the results for selected values of $s_1$ for the NCT case. As expected, ES variances across rolling windows decrease with $s_1$ increasing. As can be seen from the middle right panels, a roughly linear relationship is obtained for the logarithm of ES variance. The analysis was also conducted for the stable Paretian distribution, resulting in a similar plot (not shown).

A simple regression approach then yields the following. For a threshold of $\exp (-2)$,

(51)

The resulting procedure is then: From an initial set of 300 copula samples, the ES is evaluated, $s_1$ is computed from (51), and if $s_1>300$, an additional $s_1--300$ samples are drawn.

B The Gaussian DCC-GARCH Model

Consider a d-dimensional vector of asset returns, $\mathbf {Y}_t = \left( Y_{t,1},Y_{t,2},\ldots ,Y_{t,d} \right) '$. The ith univariate series, $i=1,\ldots , d$, is assumed to follow a GARCH(1,1) model, which is a special case of (5). We assume an unknown mean $\mu _i$, so that $Y_{t,i} - \mu _i = \epsilon _{t,i} = Z_{t,i}\sigma ^2_{t,i}$, $\sigma ^2_{t,i} = c_{0,i} + c_{1,i} \left( Y_{t-1,i} - \mu _i \right) ^2 + d_{1,i} \sigma ^2_{t-1,i}$, and $Z_{t,i}$ are i.i.d. standard normal.

1.1 B.1 Estimation Using Profile Likelihood for Each GARCH Margin

The DCC multivariate structure can be expressed as

$$\begin{aligned} \mathbf {Y}_{t\vert t-1} \sim \text {N}_d(\varvec{\mu }, \mathbf {H}_t) , \quad \mathbf {H}_t=\mathbf {D}_t \mathbf {R}_t \mathbf {D}_t, \end{aligned}$$

(52)

with $\varvec{\mu }=(\mu _1, \ldots , \mu _d)'$, $\mathbf {D}_t^2 = \mathrm{diag}([\sigma ^2_{t,1}, \ldots , \sigma ^2_{t,d} ])$, and $\{\mathbf {R}_t\}$ the set of $d\times d$ matrices of time varying conditional correlations with dynamics specified by

$$\begin{aligned} \mathbf {R}_t = {\mathbb E}_{t-1} \left[ \varvec{\epsilon }_t \varvec{\epsilon }_t' \right] = \mathrm{diag} \big ( \mathbf {Q}_{t} \big )^{-1/2} \mathbf {Q}_{t} \mathrm{diag} \big ( \mathbf {Q}_{t} \big )^{-1/2}, \end{aligned}$$

(53)

$t=1,\ldots , T$, where $\varvec{\epsilon }_t = \mathbf {D}^{-1}_t\left( \mathbf {Y}_t-\varvec{\mu }\right) $. The $\{\mathbf {Q}_t\}$ form a sequence of conditional matrices parameterized by

$$\begin{aligned} \mathbf {Q}_t = \mathbf {S}\left( 1-a-b \right) + a \left( \varvec{\epsilon }_{t-1} \varvec{\epsilon }_{t-1}' \right) +b \mathbf {Q}_{t-1}, \end{aligned}$$

(54)

with $\mathbf {S}$ the $d\times d$ unconditional correlation matrix (Engle 2002, p. 341) of the $\varvec{\epsilon }_{t}$, and parameters a and b are estimated via maximum likelihood conditional on estimates of all other parameters, as discussed next. Matrices $\mathbf {S}$ and $\mathbf {Q}_{0}$ can be estimated with the usual plug-in sample correlation based on the filtered $\varvec{\epsilon }_{t}$; see also Bali and Engle (2010) and Engle and Kelly (2012) on estimation of the DCC model. Observe that the resulting $\mathbf {Q}_t$ from the update in (54) will not necessarily be precisely a correlation matrix; this is the reason for the standardization in (53). See Caporin and McAleer (2013) for several critiques of this DCC construction; and Aielli (2013) for a modified DCC model, termed cDCC, with potentially better small-sample properties. The CCC model is a special case of (52), with $a=b=0$ in (54).

The mean vector, $\varvec{\mu }$, can be set to zero, or estimated using the sample mean of the returns, as in Engle and Sheppard (2001) and McAleer et al. (2008), though in a more general non-Gaussian context, is best estimated jointly with the other parameters associated with each univariate return series; see Paolella and Polak (2017). Let $\mathbf {Y} = [\mathbf {Y}_1, \ldots , \mathbf {Y}_T]'$, and denote the set of parameters as $\varvec{\theta }$. The log-likelihood of the remaining parameters, conditional on $\varvec{\mu }$, is given by

$$\begin{aligned} \ell (\varvec{\theta }; \mathbf {Y}, \varvec{\mu })&=-\frac{1}{2} \sum _{t} \left( d\ln (2\pi ) + \ln (\vert \mathbf {H}_t \vert ) + \left( \mathbf {Y}_t-\varvec{\mu }\right) ' \mathbf {H}^{-1}_t\left( \mathbf {Y}_t-\varvec{\mu }\right) \right) \\&=-\frac{1}{2} \sum _{t} \left( d\ln (2\pi ) + 2 \ln (\vert \mathbf {D}_t\vert ) + \ln (\vert \mathbf {R}_t\vert ) + \varvec{\epsilon }_{t}' \mathbf {R}^{-1}_t \varvec{\epsilon }_{t}\right) . \end{aligned}$$

Then, as in Engle (2002), adding and subtracting $\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t}$, $\ell $ can be decomposed as the sum of volatility and correlation terms, $\ell = \ell _V + \ell _C$, where

$$\begin{aligned} \ell _V = -\frac{1}{2} \sum _{t} \big ( d\ln (2\pi ) + 2 \ln (\vert \mathbf {D}_t\vert ) +\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t} \big ), \quad \ell _C = -\frac{1}{2} \sum _{t} \big (\ln (\vert \mathbf {R}_t\vert ) +\, \varvec{\epsilon }_{t}' \mathbf {R}^{-1}_t \varvec{\epsilon }_{t}-\varvec{\epsilon }_{t}' \varvec{\epsilon }_{t} \big ), \end{aligned}$$

so that a two-step maximum likelihood estimation procedure can be applied: First, estimate the GARCH model parameters for each univariate returns series and construct the standardized residuals; second, maximize the conditional likelihood with respect to parameters a and b in (54) based on the filtered residuals from the previous step. We now discuss this first step in more detail.

While Francq and Zakoïan (2004) prove the consistency and asymptotic normality of the GARCH model parameters, interest centers on their numeric estimation. Dropping the subscript i, the choice of starting values for $\hat{c}_0$, $\hat{c}_1$, and $\hat{d}_1$ are important, as the log-likelihood can exhibit more than one local maxima. This issue of multiple maxima has been noted by Ma et al. (2006), Winker and Maringer (2009), and Paolella and Polak (2015b), though seems to be often ignored, and can lead to inferior forecasts and jeopardize results in applied work. This unfortunate observation might help explain the results of Brooks et al. (2001, p. 54) in their extensive comparison of econometric software. In particular, they find that, with respect to estimating just the simple normal GARCH model, “the results produced using a default application of several of the most popular econometrics packages differ considerably from one another”. Another reason for discrepant results is the choice of $\epsilon _0$ and $\sigma _0$ to start the GARCH(1,1) recursion, for which several suggestions exist in the literature. We take $\hat{\sigma }^2_0$ to be the sample unconditional variance of the $R_t$, and $\hat{\epsilon }^2_0 = \kappa \hat{\sigma }^2_0$, where

$$\begin{aligned} \kappa := {\mathbb E}\big [ \left( \left| Z \right| - g Z \right) ^{\delta } \big ] \end{aligned}$$

(55)

depends on the density specification $f_Z\left( \cdot \right) $ and is stated for the more general APARCH model (5). For $Z \sim \text {N}(0,1)$, a trivial calculation yields

$$\begin{aligned} {\mathbb E}\big [ \left( \left| Z \right| - g Z \right) ^{\delta } \big ] = \frac{1}{\sqrt{2\pi }} \left[ \left( 1+ g \right) ^{\delta } + \left( 1 - g \right) ^{\delta } \right] 2^{(\delta -1)/2} \Gamma \left( \frac{\delta +1}{2}\right) . \end{aligned}$$

In our case, with $\delta =2$ and $g =0$, this reduces to $\kappa = {\mathbb E}\big [ \left| Z \right| ^2 \big ] = 1$.

Paolella and Polak (2015b) demonstrate the phenomenon of multiple maxima with a real (and typical) data set, and propose a solution that is simple to implement, making use of the profile log-likelihood (p.l.) obtained by fixing the value of $c_0$, and using a grid of points of $c_0$ between zero and 1.1 times the sample variance of the series. That is, for a fixed value of $c_0$, we compute

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm{p.l.}}(c_0) = \arg \max _{\varvec{\theta }_{\mathrm{p.l.}}} \ell (\varvec{\theta }_{\mathrm{p.l.}}; \mathbf {R}), \qquad \varvec{\theta }_{\mathrm{p.l.}} = (c_1, d_1)'. \end{aligned}$$

(56)

To obtain (with high probability) the global maximum, the following procedure suggests itself: (i) Based on a set of $c_0$ values, compute (56); (ii) take the value of $c_0$ from the set, say $c_0^{*}$, and its corresponding $\widehat{\varvec{\theta }}_{\mathrm{p.l.}}(c_0^{*})$ that results in the largest log-likelihood as starting values, to (iii) estimate the full model. The finer the grid, the higher the probability of reaching the global maximum; some trials suggest that a grid of length 10 is adequate. The use of more parameters, as arise with more elaborate GARCH structures such as the APARCH formulation, or additional shape parameter(s) of a non-Gaussian distribution such as the NCT or stable Paretian, can further exacerbate the problem of multiple local maxima of the likelihood.

1.2 B.2 Remarks on DCC

One might argue that only two parameters for modeling the evolution of an entire correlation matrix will not be adequate. While this is certainly true, the models of Engle (2002) and Tse and Tsui (2002) have two strong points: First, their use is perhaps better than no parameters (as in the CCC model), and second, it allows for easy implementation and estimation. Generalizations of the simple DCC structure that allow the number of parameters to be a function of d, and also introducing asymmetric extensions of the DCC idea, are considered in Engle (2002) and Cappiello et al. (2006), though with a potentially very large number of parameters, the usual estimation and inferential problems arise.

Bauwens and Rombouts (2007) consider an approach in which similar series are pooled into one of a small number of clusters, such that their GARCH parameters are the same within a cluster. A related idea is to group series with respect to their correlations, generalizing the DCC model; see, e.g., Vargas (2006), Billio et al. (2006), Zhou and Chan (2008), Billio and Caporin (2009), Engle and Kelly (2012), So and Yip (2012), Aielli and Caporin (2013), and the references therein.

An alternative approach is to assume a Markov switching structure between two (or more) regimes, each of which has a CCC structure, as first proposed in Pelletier (2006), and augmented to the non-Gaussian case in Paolella et al. (2017). Such a construction implies many additional parameters, but their estimation makes use of the usual sample correlation estimator, thus avoiding the curse of dimensionality, and shrinkage estimation can be straightforwardly invoked to improve performance. The idea that, for a given time segment, the correlations are constant, and take on one set (of usually two, or at most three sets) of values. This appears to be better than attempting to construct a model that allows for their variation at every point in time. The latter might be “asking too much of the data” and inundated with too many parameters. Paolella et al. (2017) demonstrate strong out-of-sample performance of their non-Gaussian Markov switching CCC model with two regimes, compared to the Gaussian CCC case, the Gaussian CCC switching case, the Gaussian DCC model, and the non-Gaussian single component CCC of Paolella and Polak (2015b).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paolella, M.S., Polak, P. (2018). COBra: Copula-Based Portfolio Optimization. In: Kreinovich, V., Sriboonchitta, S., Chakpitak, N. (eds) Predictive Econometrics and Big Data. TES 2018. Studies in Computational Intelligence, vol 753. Springer, Cham. https://doi.org/10.1007/978-3-319-70942-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-70942-0_3
Published: 02 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70941-3
Online ISBN: 978-3-319-70942-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

COBra: Copula-Based Portfolio Optimization

Abstract

References