Skip to main content
Log in

Panel data analysis: a survey on model-based clustering of time series

  • Invited Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Clustering is a widely used statistical tool to determine subsets in a given data set. Frequently used clustering methods are mostly based on distance measures and cannot easily be extended to cluster time series within a panel or a longitudinal data set. The paper reviews recently suggested approaches to model-based clustering of panel or longitudinal data based on finite mixture models. Several approaches are considered that are suitable both for continuous and for categorical time series observations. Bayesian estimation through Markov chain Monte Carlo methods is described in detail and various criteria to select the number of clusters are reviewed. An application to a panel of marijuana use among teenagers serves as an illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agresti A (1990) Categorical data analysis. Wiley, Chichester

    MATH  Google Scholar 

  • Akaike H (1974) A new look at statistical model identification. IEEE Trans Autom Control 19: 716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Aßmann C, Boysen-Hogrefe J (2011) A Bayesian approach to model-based clustering for binary panel probit models. Comput Stat Data Anal 55: 261–279

    Article  Google Scholar 

  • Baştürk N, Paap R, van Dijk D (2011) Structural differences in economic growth: An endogenous clustering approach. Appl Econ XX, forthcoming

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821

    Article  MathSciNet  MATH  Google Scholar 

  • Bauwens L, Rambouts JVK (2007) Bayesian clustering of many GARCH models. Econom Rev 26: 365–386

    Article  MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140: 2991–3002

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki C, Govaert G (1997) Using the classification likelihood to choose the number of clusters. Comput Sci Stat 29: 451–457

    Google Scholar 

  • Binder DA (1978) Bayesian cluster analysis. Biometrika 65: 31–38

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif. this issue, doi:10.1007/s11634-011-0095-6

  • Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4): 399–424

    Article  MathSciNet  Google Scholar 

  • Canova F (2004) Testing for convergence clubs in income per-capita: a predictive density approach. Int Econ Rev 45: 49–77

    Article  MathSciNet  Google Scholar 

  • Celeux G, Forbes F, Robert CP, Titterington DM (2006) Deviance information criteria for missing data models. Bayesian Anal 1: 651–674

    Article  MathSciNet  Google Scholar 

  • Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J Royal Stat Soc Ser B 56: 363–375

    MathSciNet  MATH  Google Scholar 

  • Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford

    Google Scholar 

  • Everitt BS (1979) Unresolved problems in cluster analysis. Biometrics 35: 169–181

    Article  MATH  Google Scholar 

  • Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Edward Arnold, London

    MATH  Google Scholar 

  • Fougère D, Kamionka T (2003) Bayesian inference of the mover-stayer model in continuous-time with an application to labour market transition data. J Appl Econom 18: 697–723

    Article  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97: 611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2004) Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econom J 7: 143–167

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    MATH  Google Scholar 

  • Frühwirth-Schnatter S (2011) Dealing with label switching under model uncertainty. In: Mengersen K, Robert CP, Titterington D (eds) Mixture estimation and applications, Chapter 10. Wiley, Chichester, pp 193–218

    Google Scholar 

  • Frühwirth-Schnatter S, Frühwirth R (2010) Data augmentation and MCMC for binary and multinomial logit models. In: Kneib T, Tutz G (eds) Statistical modelling and regression structures—Festschrift in Honour of Ludwig Fahrmeir. Physica, Heidelberg, pp 111–132

    Chapter  Google Scholar 

  • Frühwirth-Schnatter S, Kaufmann S (2006) How do changes in monetary policy affect bank lending? An analysis of Austrian bank data. J Appl Econom 21: 275–305

    Article  Google Scholar 

  • Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89

    Article  Google Scholar 

  • Frühwirth-Schnatter S, Pamminger C, Weber A, Winter-Ebmer R (2011) Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts Markov chain clustering. J Appl Econom 26, forthcoming

  • Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11: 317–336

    Article  Google Scholar 

  • Frühwirth-Schnatter S, Tüchler R, Otter T (2004) Bayesian analysis of the heterogeneity model. J Bus Econ Stat 22: 2–15

    Article  Google Scholar 

  • Frydman H (2005) Estimation in the mixture of Markov chains moving with different speeds. J Am Stat Assoc 100: 1046–1053

    Article  MathSciNet  MATH  Google Scholar 

  • Gamerman D, Lopes HF (2006) Markov chain Monte Carlo. Stochastic simulation for Bayesian inference, 2nd edn. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • García-Escudero L, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89–109

    Article  MathSciNet  Google Scholar 

  • Greene W, Hensher D (2003) A latent class model for discrete choice analysis: contrasts with mixed logit. Transp Res Part B 37: 681–698

    Article  Google Scholar 

  • Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Classif 25: 225–247

    Article  MATH  Google Scholar 

  • Heard NA, Holmes CC, Stephens DA (2006) A quantative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. J Am Stat Assoc 101: 18–29

    Article  MathSciNet  MATH  Google Scholar 

  • Hsiao C (2003) Analysis of panel data, 2nd edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Juárez MA, Steel MFJ (2010) Model-based clustering of non-Gaussian panel data based on skew-t distributions. J Bus Econ Stat 28: 52–66

    Article  MATH  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90: 773–795

    Article  MATH  Google Scholar 

  • Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā A62: 49–66

    MathSciNet  MATH  Google Scholar 

  • Kiefer NM, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27: 887–906

    Article  MathSciNet  MATH  Google Scholar 

  • Lang JB, McDonald JW, Smith PWF (1999) Association-marginal modelling of multivariate categorical responses: A maximim likelihood approach. J Am Stat Assoc 94: 1161–1171

    Article  MathSciNet  MATH  Google Scholar 

  • Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65: 93–119

    Article  Google Scholar 

  • Liao TW (2005) Clustering of time series data—a survey. Pattern Recogn 38: 1857–1874

    Article  MATH  Google Scholar 

  • Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects models with B-splines. Bioinformatics 19: 474–482

    Article  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics. Wiley, New York

    Google Scholar 

  • McNicholas PD, Murphy TB (2010) Model-based clustering of longitudinal data. Can J Stat 38: 153–168

    MathSciNet  MATH  Google Scholar 

  • Nobile A (2004) On the posterior distribution of the number of components in a finite mixture. Ann Stat 32: 2044–2073

    Article  MathSciNet  MATH  Google Scholar 

  • Owen AL, Videras J, Davis L (2009) Do all countries follow the same growth process?. J Econ Growth 14: 265–286

    Article  Google Scholar 

  • Pamminger C, Frühwirth-Schnatter S (2010) Model-based clustering of categorical time series. Bayesian Anal 5: 345–368

    Article  Google Scholar 

  • Peng F, Jacobs RA, Tanner MA (1996) Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. J Am Stat Assoc 91: 953–960

    Article  MATH  Google Scholar 

  • Ramoni M, Sebastiani P, Cohen P (2002) Bayesian clustering by dynamics. Mach Learn 47: 91–121

    Article  MATH  Google Scholar 

  • Ramoni M, Sebastiani P, Kohane P (2002) Clustering analysis of gene expression dynamics. Proc Natl Acad Sci 99: 9121–9126

    Article  MathSciNet  MATH  Google Scholar 

  • Rossi PE, Allenby GM, McCulloch R (2005) Bayesian statistics and marketing. Wiley, Chichester

    Book  MATH  Google Scholar 

  • Rousseau J, Mengersen K (2010) Asymptotic behaviour of the posterior distribution in overfitted mixture models. Technical report, ENSEA-CREST

  • Saul LK, Jordan MI (1999) Mixed memory Markov models: Decomposing complex stochastic processes as mixture of simpler ones. Mach Learn 37: 75–87

    Article  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

    Article  MATH  Google Scholar 

  • Scott AJ, Symons M (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27: 387–397

    Article  Google Scholar 

  • Sperrin M, Jaki T, Wit E (2010) Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat Comput 20: 357–366

    Article  MathSciNet  Google Scholar 

  • Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J Royal Stat Soc Ser B 64: 583–639

    Article  MATH  Google Scholar 

  • van Vermunt JK (2010) Longitudinal research using mixture models. In: Montfort K, Oud JHL, Satorra A (eds) Longitudinal research with latent variables, Chapter 4. Springer, Heidelberg, pp 119–152

    Chapter  Google Scholar 

  • Wooldridge JM (2005) Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J Appl Econ 20: 39–54

    Article  MathSciNet  Google Scholar 

  • Zhu H-T, Zhang H (2004) Hypothesis testing in mixture regression models. J Royal Stat Soc Ser B 66: 3–16

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvia Frühwirth-Schnatter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frühwirth-Schnatter, S. Panel data analysis: a survey on model-based clustering of time series. Adv Data Anal Classif 5, 251–280 (2011). https://doi.org/10.1007/s11634-011-0100-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-011-0100-0

Keywords

Mathematics Subject Classification (2000)

Navigation