A Copula-Based Algorithm for Discovering Patterns of Dependent Observations

Di Lascio, F. Marta L.; Giannerini, Simone

doi:10.1007/s00357-012-9099-y

A Copula-Based Algorithm for Discovering Patterns of Dependent Observations

Published: 11 January 2012

Volume 29, pages 50–75, (2012)
Cite this article

Journal of Classification Aims and scope Submit manuscript

F. Marta L. Di Lascio¹ &
Simone Giannerini¹

457 Accesses
13 Citations
Explore all metrics

Abstract

The main aim of this work is the study of clustering dependent data by means of copula functions. Copulas are popular multivariate tools whose importance within clustering methods has not been investigated yet in detail. We propose a new algorithm (CoClust in brief) that allows to cluster dependent data according to the multivariate structure of the generating process without any assumption on the margins. Moreover, the approach does not require either to choose a starting classification or to set a priori the number of clusters; in fact, the CoClust selects them by using a criterion based on the log–likelihood of a copula fit. We test our proposal on simulated data for different dependence scenarios and compare it with a model–based clustering technique. Finally, we show applications of the CoClust to real microarray data of breast-cancer patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Copula–based clustering methods

Clustering dependent observations with copula functions

Article 26 August 2016

High-dimensional variable selection with the plaid mixture model for clustering

Article 17 May 2018

References

AZZALINI, A., and CAPITANIO, A. (1999), “Statistical Applications of the Multivariate Skew-Normal Distribution”, Journal of the Royal Statistical Society, B(61), 579–602.
AZZALINI, A., and DALLA VALLE, A. (1996), “The Multivariate Skew-Normal Distribution”, Biometrika, 83, 715-726.
Article MathSciNet MATH Google Scholar
CHERUBINI, U., LUCIANO, E., and VECCHIATO, W. (2004), Copula Methods in Finance, Wiley Finance Series, Chichester: John Wiley & Sons Ltd.
CHIPMAN, H., and TIBSHIRANI, R. (2006), “Hybrid Hierarchical Clustering with Applications to Microarray Data”, Biostatistics, 7(2), 286–301.
Article MATH Google Scholar
DI LASCIO, F.M.L. (2008), “Analyzing the Dependence Structure of Microarray Data: A Copula-Based Approach”, PhD thesis, Dipartimento di Scienze Statistiche, Università di Bologna, Italy, http://amsdottorato.cib.unibo.it/670/.
EISEN, M.B., SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998), “Cluster Analysis and Display of Genome–Wide Expression Patterns”, Proceedings of the National Academy of Sciences, 95, 14863–14868.
Article Google Scholar
FRALEY, C., and RAFTERY, A.E. (1998), “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.
Article MATH Google Scholar
FRALEY, C., and RAFTERY, A. E. (2000). “Model–Based Clustering, Discriminat Analysis and Density Estimation”, Technical Report, University ofWashington, Department of Statistics.
FRIEDMAN, N., LINIAL, M., NACHMAN, I., and PE’ER, D. (2000), “Using Bayesian Networks to Analyze Expression Data”, Journal of Computational Biology, 7(3), 601–620.
Article Google Scholar
GODAMBE, V.P. (1960), “An Optimum Property of Regular Maximum Likelihood Estimation”, Annals of Mathematical Statistics, 31, 1208–1211.
Article MathSciNet Google Scholar
HEDENFALK, I., DUGGAN, D., CHEN, Y., RADMACHER, M., BITTNER, M., SIMON, R.,MELTZER, P., GUSTERSON, B., ESTELLER,M., KALLIONIEMI, O.P., WILFOND, B., BORG, A., DOUGHERTY, E., KONONEN, J., BUBENDORF, L., FEHRLE,W., PITTALUGA, S., GRUVBERGER, S., LOMAN, N., JOHANNSSON, O., OLSSON, H., and SAUTER, G. (2001), “Gene–Expression Profiles in Hereditary Breast Cancer”, The New England Journal of Medicine, 344(8), 539–548.
Article Google Scholar
JOE, H. (1997), Multivariate Models and Dependence Concepts, Vol. 73 of Monographs on Statistics and Applied Probability, London: Chapman & Hall.
Google Scholar
JOE, H., and XU, J. (1996), “The EstimationMethod of Inference Functions forMargins for Multivariate Models”, Technical Report, University of British Columbia, Department of Statistics.
MADEIRA, S.C., and OLIVEIRA, A.L. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey”, IEEE. Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.
Article Google Scholar
MAR, J., and MCLACHLAN, G.J. (2003), “Model-Based Clustering in Gene Expression Microarrays: An Application to Breast Cancer Data”, in First Asia-Pacific Bioinformatics Conference, Research and Practice in Information Technology, 19, pp. 139–144.
MOREAU, Y., DE SMET, F., and THIJS, G. (2002), “Functional Bioinformatics of Microarray Data: From Expression to Regulation”, in Proceedings of the IEEE, 90(11), pp.1722–1743.
NELSEN, R.B. (2006), Introduction to Copulas, New York: Springer.
MATH Google Scholar
PAN, W., LIN, J., and LE, C.T. (2002), “Model–Based Cluster Analysis of Microarray Gene–Expression Data”, Genome Biology, 3(2), research0009.1–0009.8.
SCHWEIZER, B. and SKLAR, A. (1983), Probabilistic Metric Spaces, NewYork: North–Holland. SKLAR, A. (1959), “Fonctions de répartition à n dimensions et leures marges”, Publications de l’Institut de Statistique de L’Université de Paris, 8, 229–231.
SØRLIE, T., PEROU, C., TIBSHIRANI, R., AAS, T., GEISLER, S., JOHNSEN, H., HASTIE, T., EISEN,M., VAN DE RIJN,M., JEFFREY, S.S., THORSEN, T., QUIST, H., MATESE, J.C., BROWN, P.O., BOTSTEIN, D., EYSTEIN LØNNING, P., and BØRRESEN-DALE, A. L. (2001), “Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications”, Proceedings of the National Academy of Sciences of the United States of America, 98, 10869–10874.
TAVAZOIE, S., HUGHES, J.D., CAMPBELL, M.J., CHO, R.J., and CHURCH, G.M. (2001), Systematic Determination of Genetic Network Architecture, Nature Genetics, 22(3), 281–285.
Google Scholar
YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001), “Model-Based Clustering and Data Transformation for Gene Expression Data”, Bioinformatics, 17(10), 977–987.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Scienze Statistiche, Università di Bologna, via Belle Arti 41, 40126, Bologna, Italy
F. Marta L. Di Lascio & Simone Giannerini

Authors

F. Marta L. Di Lascio
View author publications
You can also search for this author in PubMed Google Scholar
Simone Giannerini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Marta L. Di Lascio.

Additional information

The authors wish to thank Estela Bee Dagum, Paola Monari and Alessandra Luati for their support. This work has been partially financed by MIUR funds. Supplementary material and the R package CoClust are available at http://www2.stat.unibo.it/giannerini/coclust.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di Lascio, F.M.L., Giannerini, S. A Copula-Based Algorithm for Discovering Patterns of Dependent Observations. J Classif 29, 50–75 (2012). https://doi.org/10.1007/s00357-012-9099-y

Download citation

Published: 11 January 2012
Issue Date: April 2012
DOI: https://doi.org/10.1007/s00357-012-9099-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Copula-Based Algorithm for Discovering Patterns of Dependent Observations

Abstract

Access this article

Similar content being viewed by others

Copula–based clustering methods

Clustering dependent observations with copula functions

High-dimensional variable selection with the plaid mixture model for clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Copula-Based Algorithm for Discovering Patterns of Dependent Observations

Abstract

Access this article

Similar content being viewed by others

Copula–based clustering methods

Clustering dependent observations with copula functions

High-dimensional variable selection with the plaid mixture model for clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation