Skip to main content

Advertisement

Log in

A Copula-Based Algorithm for Discovering Patterns of Dependent Observations

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

The main aim of this work is the study of clustering dependent data by means of copula functions. Copulas are popular multivariate tools whose importance within clustering methods has not been investigated yet in detail. We propose a new algorithm (CoClust in brief) that allows to cluster dependent data according to the multivariate structure of the generating process without any assumption on the margins. Moreover, the approach does not require either to choose a starting classification or to set a priori the number of clusters; in fact, the CoClust selects them by using a criterion based on the log–likelihood of a copula fit. We test our proposal on simulated data for different dependence scenarios and compare it with a model–based clustering technique. Finally, we show applications of the CoClust to real microarray data of breast-cancer patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AZZALINI, A., and CAPITANIO, A. (1999), “Statistical Applications of the Multivariate Skew-Normal Distribution”, Journal of the Royal Statistical Society, B(61), 579–602.

  • AZZALINI, A., and DALLA VALLE, A. (1996), “The Multivariate Skew-Normal Distribution”, Biometrika, 83, 715-726.

    Article  MathSciNet  MATH  Google Scholar 

  • CHERUBINI, U., LUCIANO, E., and VECCHIATO, W. (2004), Copula Methods in Finance, Wiley Finance Series, Chichester: John Wiley & Sons Ltd.

  • CHIPMAN, H., and TIBSHIRANI, R. (2006), “Hybrid Hierarchical Clustering with Applications to Microarray Data”, Biostatistics, 7(2), 286–301.

    Article  MATH  Google Scholar 

  • DI LASCIO, F.M.L. (2008), “Analyzing the Dependence Structure of Microarray Data: A Copula-Based Approach”, PhD thesis, Dipartimento di Scienze Statistiche, Università di Bologna, Italy, http://amsdottorato.cib.unibo.it/670/.

  • EISEN, M.B., SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998), “Cluster Analysis and Display of Genome–Wide Expression Patterns”, Proceedings of the National Academy of Sciences, 95, 14863–14868.

    Article  Google Scholar 

  • FRALEY, C., and RAFTERY, A.E. (1998), “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.

    Article  MATH  Google Scholar 

  • FRALEY, C., and RAFTERY, A. E. (2000). “Model–Based Clustering, Discriminat Analysis and Density Estimation”, Technical Report, University ofWashington, Department of Statistics.

  • FRIEDMAN, N., LINIAL, M., NACHMAN, I., and PE’ER, D. (2000), “Using Bayesian Networks to Analyze Expression Data”, Journal of Computational Biology, 7(3), 601–620.

    Article  Google Scholar 

  • GODAMBE, V.P. (1960), “An Optimum Property of Regular Maximum Likelihood Estimation”, Annals of Mathematical Statistics, 31, 1208–1211.

    Article  MathSciNet  Google Scholar 

  • HEDENFALK, I., DUGGAN, D., CHEN, Y., RADMACHER, M., BITTNER, M., SIMON, R.,MELTZER, P., GUSTERSON, B., ESTELLER,M., KALLIONIEMI, O.P., WILFOND, B., BORG, A., DOUGHERTY, E., KONONEN, J., BUBENDORF, L., FEHRLE,W., PITTALUGA, S., GRUVBERGER, S., LOMAN, N., JOHANNSSON, O., OLSSON, H., and SAUTER, G. (2001), “Gene–Expression Profiles in Hereditary Breast Cancer”, The New England Journal of Medicine, 344(8), 539–548.

    Article  Google Scholar 

  • JOE, H. (1997), Multivariate Models and Dependence Concepts, Vol. 73 of Monographs on Statistics and Applied Probability, London: Chapman & Hall.

    Google Scholar 

  • JOE, H., and XU, J. (1996), “The EstimationMethod of Inference Functions forMargins for Multivariate Models”, Technical Report, University of British Columbia, Department of Statistics.

  • MADEIRA, S.C., and OLIVEIRA, A.L. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey”, IEEE. Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.

    Article  Google Scholar 

  • MAR, J., and MCLACHLAN, G.J. (2003), “Model-Based Clustering in Gene Expression Microarrays: An Application to Breast Cancer Data”, in First Asia-Pacific Bioinformatics Conference, Research and Practice in Information Technology, 19, pp. 139–144.

  • MOREAU, Y., DE SMET, F., and THIJS, G. (2002), “Functional Bioinformatics of Microarray Data: From Expression to Regulation”, in Proceedings of the IEEE, 90(11), pp.1722–1743.

  • NELSEN, R.B. (2006), Introduction to Copulas, New York: Springer.

    MATH  Google Scholar 

  • PAN, W., LIN, J., and LE, C.T. (2002), “Model–Based Cluster Analysis of Microarray Gene–Expression Data”, Genome Biology, 3(2), research0009.1–0009.8.

  • SCHWEIZER, B. and SKLAR, A. (1983), Probabilistic Metric Spaces, NewYork: North–Holland. SKLAR, A. (1959), “Fonctions de répartition à n dimensions et leures marges”, Publications de l’Institut de Statistique de L’Université de Paris, 8, 229–231.

  • SØRLIE, T., PEROU, C., TIBSHIRANI, R., AAS, T., GEISLER, S., JOHNSEN, H., HASTIE, T., EISEN,M., VAN DE RIJN,M., JEFFREY, S.S., THORSEN, T., QUIST, H., MATESE, J.C., BROWN, P.O., BOTSTEIN, D., EYSTEIN LØNNING, P., and BØRRESEN-DALE, A. L. (2001), “Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications”, Proceedings of the National Academy of Sciences of the United States of America, 98, 10869–10874.

  • TAVAZOIE, S., HUGHES, J.D., CAMPBELL, M.J., CHO, R.J., and CHURCH, G.M. (2001), Systematic Determination of Genetic Network Architecture, Nature Genetics, 22(3), 281–285.

    Google Scholar 

  • YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001), “Model-Based Clustering and Data Transformation for Gene Expression Data”, Bioinformatics, 17(10), 977–987.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Marta L. Di Lascio.

Additional information

The authors wish to thank Estela Bee Dagum, Paola Monari and Alessandra Luati for their support. This work has been partially financed by MIUR funds. Supplementary material and the R package CoClust are available at http://www2.stat.unibo.it/giannerini/coclust.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di Lascio, F.M.L., Giannerini, S. A Copula-Based Algorithm for Discovering Patterns of Dependent Observations. J Classif 29, 50–75 (2012). https://doi.org/10.1007/s00357-012-9099-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-012-9099-y

Keywords

Navigation