Abstract
Copulas are distribution functions with standard uniform univariate marginals. Copulas are widely used for studying dependence among continuously distributed random variables, with applications in finance and quantitative risk management; see, e.g., the pricing of collateralized debt obligations (Hofert and Scherer, Quantitative Finance, 11(5), 775–787, 2011). The ability to model complex dependence structures among variables has recently become increasingly popular in the realm of statistics, one example being data mining (e.g., cluster analysis, evolutionary algorithms or classification). The present work considers an estimator for both the structure and the parameters of hierarchical Archimedean copulas. Such copulas have recently become popular alternatives to the widely used Gaussian copulas. The proposed estimator is based on a pairwise inversion of Kendall’s tau estimator recently considered in the literature but can be based on other estimators as well, such as likelihood-based. A simple algorithm implementing the proposed estimator is provided. Its performance is investigated in several experiments including a comparison to other available estimators. The results show that the proposed estimator can be a suitable alternative in the terms of goodness-of-fit and computational efficiency. Additionally, an application of the estimator to copula-based Bayesian classification is presented. A set of new Archimedean and hierarchical Archimedean copula-based Bayesian classifiers is compared with other commonly known classifiers in terms of accuracy on several well-known datasets. The results show that the hierarchical Archimedean copula-based Bayesian classifiers are, despite their limited applicability for high-dimensional data due to expensive time consumption, similar to highly-accurate classifiers like support vector machines or ensemble methods on low-dimensional data in terms of accuracy while keeping the produced models rather comprehensible.
Similar content being viewed by others
Notes
also called fully-nested Archimedean copula
also called partially-nested Archimedean copula
References
Aas, K., Czado, C., Frigessi, A., Bakken, H. (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics, 44(2), 182–198.
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F. (2010). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17, 255–287.
Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Berg, D. (2009). Copula goodness-of-fit testing: an overview and power comparison. The European Journal of Finance, 15(7–8), 675–701.
Bouyé, E., Durrleman, V., Nikeghbali, A., Riboulet, G., Roncalli, T. (2000). Copulas for finance - a reading guide and some applications. Available at SSRN 1032533.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123–140.
Breiman, L., Freidman, J., Olshen, R., Stone, C. (1984). Classification and Regression Trees. Wadsworth.
Chen, X., Fan, Y., Patton, A.J. (2004). Simple tests for models of dependence between multiple financial time series, with applications to us equity returns and exchange rates. Discussion paper 483, Financial Markets Group, London School of Economics.
Clarke, B., Fokoue, E., Zhang, H.H. (2009). Principles and Theory for Data Mining and Machine Learning. Springer.
Cramér, H. (1928). On the composition of elementary errors: First paper: Mathematical deductions. Scandinavian Actuarial Journal, 1928(1), 13–74.
Cuvelier, E., & Noirhomme-Fraitur, M. (2005). Clayton copula and mixture decomposition. In Applied Stochastic Models and Data Analysis, ASMDA’05. Brest.
Freund, Y., & Schapire, R.E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, (pp. 23–37). Springer.
Genest, C., & Favre, A. (2007). Everything you always wanted to know about copula modeling but were afraid to ask. Hydrologic Engineering, 12, 347–368.
Genest, C., & Rémillard, B. (2008). Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models. In Annales de l’Institut Henri Poincaré: Probabilités et Statistiques, (Vol. 44, pp. 1096–1127).
Genest, C., Rémillard, B., Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 44(2), 199–213.
Genest, C., & Rivest, L.P. (1993). Statistical inference procedures for bivariate archimedean copulas. Journal of the American statistical Association, 88(423), 1034–1043.
González-Fernández, Y., & Soto, M. (2012). Copulaedas: An R package for estimation of distribution algorithms based on copulas. arXiv:1209.5429.
Górecki, J., Hofert, M., Holeṅa, M. (2014). On the consistency of an estimator for hierarchical archimedean copulas. In Talaṡová, J., Stoklasa, J., Taláṡek, T. (Eds.) 32nd International Conference on Mathematical Methods in Economics, (pp. 239–244). Olomouc: Palacký University.
Górecki, J., & Holeňa, M. (2013). An alternative approach to the structure determination of hierarchical Archimedean copulas. In Proceedings of the 31st International Conference on Mathematical Methods in Economics (MME 2013) (pp. 201–206). Jihlava.
Górecki, J., & Holeňa, M. (2014). Structure determination and estimation of hierarchical Archimedean copulas based on Kendall correlation matrix. In Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (Eds.) New Frontiers in Mining Complex Patterns, Lecture Notes in Computer Science, (pp. 132–147).
Hofert, M. (2010a). Construction and sampling of nested Archimedean copulas. In Jaworski, P., Durante, F., Hardle, W.K., Rychlik, T. (Eds.), Copula Theory and Its Applications, Lecture Notes in Statistics, vol 198, (pp. 147–160). Springer Berlin Heidelberg.
Hofert, M. (2010b). Sampling Nested Archimedean Copulas with Applications to CDO Pricing: Suedwestdeutscher Verlag fuer Hochschulschriften.
Hofert, M. (2011). Efficiently sampling nested Archimedean copulas. Computational Statistics and Data Analysis, 55(1), 57–70.
Hofert, M. (2012). A stochastic representation and sampling algorithm for nested Archimedean copulas. Journal of Statistical Computation and Simulation, 82(9), 1239–1255. doi:10.1080/00949655.2011.574632.
Hofert, M., Mächler, M., Mcneil, A.J. (2012). Likelihood inference for archimedean copulas in high dimensions under known margins. Journal of Multivariate Analysis, 110, 133–150.
Hofert, M., Mächler, M., McNeil, A.J. (2013). Archimedean copulas in high dimensions: Estimators and numerical challenges motivated by financial applications. Journal de la Société Française de Statistique, 154(1), 25–63.
Hofert, M., & Scherer, M. (2011). CDO pricing with nested Archimedean copulas. Quantitative Finance, 11(5), 775–787.
Holeňa, M., & Ščavnický, M. (2013). Application of copulas to data mining based on observational logic. In ITAT: Information Technologies Applications and Theory Workshops, Posters, and Tutorials, North Charleston: CreateSpace Independent Publishing Platform, Donovaly Slovakia.
Joe, H. (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall.
Kao, S.C., Ganguly, A.R., Steinhaeuser, K. (2009). Motivating complex dependence structures in data mining: A case study with anomaly detection in climate. In International Conference on Data Mining Workshops. doi:10.1109/ICDMW.2009.37, (Vol. 0, pp. 223–230).
Kao, S.C., & Govindaraju, R.S. (2008). Trivariate statistical analysis of extreme rainfall events via plackett family of copulas. Water Resources Research, 44.
Kojadinovic, I. (2010). Hierarchical clustering of continuous variables based on the empirical copula process and permutation linkages. Computational Statistics & Data Analysis, 54(1), 90–108.
Kojadinovic, I., & Yan, J. (2010a). Comparison of three semiparametric methods for estimating dependence parameters in copula models. Insurance: Mathematics and Economics, 47, 52–63.
Kojadinovic, I., & Yan, J. (2010b). Modeling multivariate distributions with continuous margins using the copula r package. Journal of Statistical Software, 34(9), 1–20.
Kuhn, G., Khan, S., Ganguly, A.R., Branstetter, M.L. (2007). Geospatial-temporal dependence among weekly precipitation extremes with applications to observations and climate model simulations in South America. Advances in X-ray Analysis, 30(12), 2401–2423.
Lachenbruch, P.A. (1975). Discriminant analysis. Wiley Online Library.
Lascio, F., & Giannerini, S. (2012). A copula-based algorithm for discovering patterns of dependent observations. Journal of Classification, 29, 50–75. doi:10.1007/s00357-012-9099-y.
Maity, R., & Kumar, D.N. (2008). Probabilistic prediction of hydroclimatic variables with nonparametric quantification of uncertainty. Journal of Geophysical Research, 113.
McNeil, A.J. (2008). Sampling nested Archimedean copulas. Journal of Statistical Computation and Simulation, 78(6), 567–581.
McNeil, A.J., & Nešlehová, J. (2009). Multivariate Archimedean copulas, d-monotone functions and l 1-norm symmetric distributions. The Annals of Statistics, 37, 3059–3097.
Moehmel, S., Steinfeldt, N., Engelschalt, S., Holena, M., Kolf, S., Baerns, M., Dingerdissen, U., Wolf, D., Weber, R., Bewersdorf, M. (2008). New catalytic materials for the high-temperature synthesis of hydrocyanic acid from methane and ammonia by high-throughput approach. Applied Catalysis A: General, 334(1), 73–83.
Nelsen, R. (2006). An Introduction to Copulas, 2nd edn. Springer.
Okhrin, O., Okhrin, Y., Schmid, W. (2013a). On the structure and estimation of hierarchical Archimedean copulas. Journal of Econometrics, 173(2), 189–204. http://www.sciencedirect.com/science/article/pii/S0304407612002667.
Okhrin, O., Okhrin, Y., Schmid, W. (2013b). Properties of hierarchical Archimedean copulas. Statistics & Risk Modeling, 30(1), 21–54.
Okhrin, O., & Ristig, A. (2014). Hierarchical Archimedean copulae: The HAC package. Journal of Statistical Software, 58(4). http://www.jstatsoft.org/v58/i04.
Rey, M., & Roth, V. (2012). Copula mixture model for dependency-seeking clustering. Preprint. arXiv:1206.6433
Sathe, S. (2006). A novel Bayesian classifier using copula functions. Preprint arXiv:cs/0611150.
Savu, C., & Trede, M. (2008). Goodness-of-fit tests for parametric families of Archimedean copulas. Quantitative Finance, 8(2), 109–116.
Savu, C., & Trede, M. (2010). Hierarchies of Archimedean copulas. Quantitative Finance, 10, 295–304.
Segers, J., & Uyttendaele, N. (2014). Nonparametric estimation of the tree structure of a nested Archimedean copula. Computational Statistics & Data Analysis, 72, 190–204.
Sklar, A. (1959). Fonctions de répartition a n dimensions et leurs marges. Publishing Institute of Statistical University Paris, 8, 229–231.
Smith, M.S., Gan, Q., Kohn, R.J. (2012). Modelling dependence using skew t copulas: Bayesian inference and applications. Journal of Applied Econometrics, 27(3), 500–522.
Vapnik, V. (2000). The nature of statistical learning theory. Springer.
Wang, L., Guo, X., Zeng, J., Hong, Y. (2012). Copula estimation of distribution algorithms based on exchangeable Archimedean copula. International Journal of Computer Applications in Technology, 43, 13–20. doi:10.1504/IJCAT.2012.045836, http://inderscience.metapress.com/content/42R4M650P16V1227.
Wolpert, D.H. (2002). The supervised learning no-free-lunch theorems. In Soft Computing and Industry (pp. 25–42). Springer.
Yuan, A., Chen, G., Zhou, Z.C., Bonney, G., Rotimi, C. (2008). Gene copy number analysis for family data using semiparametric copula model. Bioinform Biol Insights, 2, 343–355.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of Jan Górecki was funded by the project SGS/21/2014 - Advanced methods for knowledge discovery from data and their application in expert systems, Czech Republic. The work of Martin Holeňa was funded by the Czech Science Foundation (GA ČR) grant 13-17187S.
Rights and permissions
About this article
Cite this article
Górecki, J., Hofert, M. & Holeňa, M. An approach to structure determination and estimation of hierarchical Archimedean Copulas and its application to Bayesian classification. J Intell Inf Syst 46, 21–59 (2016). https://doi.org/10.1007/s10844-014-0350-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-014-0350-3