Skip to main content
Log in

Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

One key point in cluster analysis is to determine a similarity or dissimilarity measure between data objects. When working with time series, the concept of similarity can be established in different ways. In this paper, several non-parametric statistics originally designed to test the equality of the log-spectra of two stochastic processes are proposed as dissimilarity measures between time series data. Their behavior in time series clustering is analyzed throughout a simulation study, and compared with the performance of several model-free and model-based dissimilarity measures. Up to three different classification settings were considered: (i) to distinguish between stationary and non-stationary time series, (ii) to classify different ARMA processes and (iii) to classify several non-linear time series models. As it was expected, the performance of a particular dissimilarity metric strongly depended on the type of processes subjected to clustering. Among all the measures studied, the nonparametric distances showed the most robust behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ALONSO, A., BERRENDERO, J., HERNANDEZ, A., and JUSTEL, A. (2006), “Time Series Clustering Based on Forecast Densities,” Computational Statistics and Data Analysis, 51, 762–776.

    Article  MATH  MathSciNet  Google Scholar 

  • BARAGONA, R. (2001), “A Simulation Study on Clustering Time Series with Metaheuristic Methods,” Quaderni di Statistica, 3, 1–26.

    MathSciNet  Google Scholar 

  • BOETS, J., DE COCK, K., ESPINOZA, M., and DE MOOR, B. (2005), “Clustering Time Series, Subspace Identification and Cepstral Distances,” Communications in Information and Systems, 5, 69–96.

    MATH  MathSciNet  Google Scholar 

  • BOHTE, Z., CEPAR, D., and KOS̆MELJ, K. (1980), “Clustering of Time Series,” Proceedings of COMPSTAT80, 587–593.

  • CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-based Metric for Time Series Classification,” Computational Statistics and Data Analysis, 50, 2668–2684.

    Article  MATH  MathSciNet  Google Scholar 

  • CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric,” Computational Statistics and Data Analysis, 52, 1860–1872.

    Article  MATH  MathSciNet  Google Scholar 

  • D’URSO, P., and MAHARAJ, E.A. (2009), “Autocorrelation-based Fuzzy Clustering of Time Series,” Fuzzy Sets and Systems, 160, 3565–3589.

    Article  MathSciNet  Google Scholar 

  • FAN, J., and GIJBELS, I. (1996), “Local Polynomial Modelling and Its Applications”, London: Chapman and Hall.

    MATH  Google Scholar 

  • FAN, J., and KREUTZBERGER, E. (1998), “Automatic Local Smoothing for Spectral Density Estimation,” Scandinavian Journal of Statistics, 25, 359–369.

    Article  MATH  MathSciNet  Google Scholar 

  • FAN, J., and ZHANG, W. (2004), “Generalised Likelihood Ratio Tests for Spectral Density,” Biometrika, 91, 195–209.

    Article  MATH  MathSciNet  Google Scholar 

  • GALBRAITH, J., and JIAQING, L. (1999), “Cluster and Discriminant Analysis on Time Series as a Research Tool,” UTIP Working Paper Number 6, The University of Texas at Austin, Austin: Lyndon B.

  • GALEANO, P., and PEÑA, D. (2000), “Multivariate Analysis in Vector Time Series,” Resenhas, 4, 383–403.

    MATH  MathSciNet  Google Scholar 

  • GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the StockMarket: WhichMeasure is Best?,” in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD), pp 487–496.

  • HIRSCH, B., and DUBOIS, D. (1991), “Self-esteem in Early Adolescence: The Identification and Prediction of Contrasting Longitudinal Trajectories,” Journal of Youth and Adolescence, 20, 53–72.

    Article  Google Scholar 

  • KAKIZAWA, Y., SHUMWAY, R.H., and TANIGUCHI, M. (1998), “Discrimination and Clustering for Multivariate Time Series,” Journal of the American Statistical Association, 93, 328–340.

    Article  MATH  MathSciNet  Google Scholar 

  • KALPAKIS, G.D., GADA, K., and PUTTAGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-series,” in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280.

  • KOS̆MELJ, K. (1986). “A Two-step Procedure for Clustering Time Varying Data,” Journal of Mathematical Sociology, 12, 315–326.

    Article  Google Scholar 

  • LI, C., BISWAS, G., DALE, M., and DALE, P. (2001), “Building Models of Ecological Dynamics Using HMM Based Temporal Data Clustering–A Preliminary Study,”, in Advances in Intelligent Data Analysis, the Fourth International Conference on Intelligent Data Analysis, Lecture Notes in Computer Science Series (Vol 2189), Springer, pp. 53–62.

  • LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey,” Pattern Recognition, 38, 1857–1874.

    Article  MATH  Google Scholar 

  • MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models,” Journal of Statistical Computation and Simulation, 54, 305–331.

    Article  MATH  MathSciNet  Google Scholar 

  • MAHARAJ, E.A. (2000), “Clusters of Time Series,” Journal of Classification, 17, 297–314.

    Article  MATH  MathSciNet  Google Scholar 

  • MAHARAJ, E.A. (2002), “Comparison of Non-stationary Time Series in the Frequency Domain,” Computational Statistics and Data Analysis, 40, 131–141.

    Article  MATH  MathSciNet  Google Scholar 

  • OTRANTO, E. (2008), “Clustering Heteroskedastic Time Series by Model-based Procedures,” Computational Statistics and Data Analysis, 52, 4685–4698.

    Article  MATH  MathSciNet  Google Scholar 

  • PEÑA, D., and PONCELA, P. (2006), “Nonstationary Dynamic Factor Analysis,” Journal of Statistical Planning and Inference, 136, 1237–1257.

    Article  MATH  MathSciNet  Google Scholar 

  • PERRON, P. (1988), “Trends and Random Walks in Macroeconomics Time Series,” Dynamics and Control, 12, 297–332.

    Article  MATH  MathSciNet  Google Scholar 

  • PICCOLO, D. (1990), “A Distance Measure for Classifying Arima Models,” Journal of Time Series Analysis, 11, 153–164.

    Article  MATH  Google Scholar 

  • SAID, S.E., and DICKEY, D.A. (1984), “Testing for Unit Roots in Autoregressive-moving Average Models of Unknown Order,” Biometrika, 71, 599–607.

    Article  MATH  MathSciNet  Google Scholar 

  • SHUMWAY, R.H., and UNGER, A.N. (1974), “Linear Discriminant Functions for Stationary Time Series”, Journal of the American Statistical Association, 69, 948–956.

    Article  MATH  MathSciNet  Google Scholar 

  • TONG, H., and DABAS, P. (1990), “Cluster of Time Series Models: An Example,” Journal of Applied Statistics, 17, 187–198.

    Article  Google Scholar 

  • TONG, H., and YEUNG, I. (1991), “On Tests for Self-exciting Threshold Autoregressive Type Non-linearity in Partially Observed Time Series,” Applied Statistics, 40, 43–62.

    Article  MATH  MathSciNet  Google Scholar 

  • VILAR, J.A., ALONSO, A., and VILAR, J.M. (2010), “Non-linear Time Series Clustering Based on Non-parametric Forecast Densities,” Computational Statistics and Data Analysis, 54, 2850–2865.

    Article  Google Scholar 

  • VILAR, J.A., and PERTEGA, S. (2004), “Discriminant and Cluster Analysis for Gaussian Stationary Processes: Local Linear Fitting Approach,” Journal of Nonparametric Statistics, 16, 443–462.

    Article  MATH  MathSciNet  Google Scholar 

  • VILAR, J.M., VILAR, J.A., and PÉRTEGA, S. (2009), “Classifying Time Series Data: A Nonparametric Approach,” Journal of Classification, 26, 3–28.

    Article  MATH  Google Scholar 

  • XIONG, Y., and YEUNG, D.Y. (2002), “Mixtures of ARMA Models forModel-based Time Series Clustering,” in Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 717–720.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonia Pértega Díaz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Díaz, S.P., Vilar, J.A. Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study. J Classif 27, 333–362 (2010). https://doi.org/10.1007/s00357-010-9064-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-010-9064-6

Keywords

Navigation