Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study

Díaz, Sonia Pértega; Vilar, José A.

doi:10.1007/s00357-010-9064-6

Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study

Published: 21 October 2010

Volume 27, pages 333–362, (2010)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Sonia Pértega Díaz¹ &
José A. Vilar²

779 Accesses
40 Citations
Explore all metrics

Abstract

One key point in cluster analysis is to determine a similarity or dissimilarity measure between data objects. When working with time series, the concept of similarity can be established in different ways. In this paper, several non-parametric statistics originally designed to test the equality of the log-spectra of two stochastic processes are proposed as dissimilarity measures between time series data. Their behavior in time series clustering is analyzed throughout a simulation study, and compared with the performance of several model-free and model-based dissimilarity measures. Up to three different classification settings were considered: (i) to distinguish between stationary and non-stationary time series, (ii) to classify different ARMA processes and (iii) to classify several non-linear time series models. As it was expected, the performance of a particular dissimilarity metric strongly depended on the type of processes subjected to clustering. Among all the measures studied, the nonparametric distances showed the most robust behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

References

ALONSO, A., BERRENDERO, J., HERNANDEZ, A., and JUSTEL, A. (2006), “Time Series Clustering Based on Forecast Densities,” Computational Statistics and Data Analysis, 51, 762–776.
Article MATH MathSciNet Google Scholar
BARAGONA, R. (2001), “A Simulation Study on Clustering Time Series with Metaheuristic Methods,” Quaderni di Statistica, 3, 1–26.
MathSciNet Google Scholar
BOETS, J., DE COCK, K., ESPINOZA, M., and DE MOOR, B. (2005), “Clustering Time Series, Subspace Identification and Cepstral Distances,” Communications in Information and Systems, 5, 69–96.
MATH MathSciNet Google Scholar
BOHTE, Z., CEPAR, D., and KOS̆MELJ, K. (1980), “Clustering of Time Series,” Proceedings of COMPSTAT80, 587–593.
CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-based Metric for Time Series Classification,” Computational Statistics and Data Analysis, 50, 2668–2684.
Article MATH MathSciNet Google Scholar
CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric,” Computational Statistics and Data Analysis, 52, 1860–1872.
Article MATH MathSciNet Google Scholar
D’URSO, P., and MAHARAJ, E.A. (2009), “Autocorrelation-based Fuzzy Clustering of Time Series,” Fuzzy Sets and Systems, 160, 3565–3589.
Article MathSciNet Google Scholar
FAN, J., and GIJBELS, I. (1996), “Local Polynomial Modelling and Its Applications”, London: Chapman and Hall.
MATH Google Scholar
FAN, J., and KREUTZBERGER, E. (1998), “Automatic Local Smoothing for Spectral Density Estimation,” Scandinavian Journal of Statistics, 25, 359–369.
Article MATH MathSciNet Google Scholar
FAN, J., and ZHANG, W. (2004), “Generalised Likelihood Ratio Tests for Spectral Density,” Biometrika, 91, 195–209.
Article MATH MathSciNet Google Scholar
GALBRAITH, J., and JIAQING, L. (1999), “Cluster and Discriminant Analysis on Time Series as a Research Tool,” UTIP Working Paper Number 6, The University of Texas at Austin, Austin: Lyndon B.
GALEANO, P., and PEÑA, D. (2000), “Multivariate Analysis in Vector Time Series,” Resenhas, 4, 383–403.
MATH MathSciNet Google Scholar
GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the StockMarket: WhichMeasure is Best?,” in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD), pp 487–496.
HIRSCH, B., and DUBOIS, D. (1991), “Self-esteem in Early Adolescence: The Identification and Prediction of Contrasting Longitudinal Trajectories,” Journal of Youth and Adolescence, 20, 53–72.
Article Google Scholar
KAKIZAWA, Y., SHUMWAY, R.H., and TANIGUCHI, M. (1998), “Discrimination and Clustering for Multivariate Time Series,” Journal of the American Statistical Association, 93, 328–340.
Article MATH MathSciNet Google Scholar
KALPAKIS, G.D., GADA, K., and PUTTAGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-series,” in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280.
KOS̆MELJ, K. (1986). “A Two-step Procedure for Clustering Time Varying Data,” Journal of Mathematical Sociology, 12, 315–326.
Article Google Scholar
LI, C., BISWAS, G., DALE, M., and DALE, P. (2001), “Building Models of Ecological Dynamics Using HMM Based Temporal Data Clustering–A Preliminary Study,”, in Advances in Intelligent Data Analysis, the Fourth International Conference on Intelligent Data Analysis, Lecture Notes in Computer Science Series (Vol 2189), Springer, pp. 53–62.
LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey,” Pattern Recognition, 38, 1857–1874.
Article MATH Google Scholar
MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models,” Journal of Statistical Computation and Simulation, 54, 305–331.
Article MATH MathSciNet Google Scholar
MAHARAJ, E.A. (2000), “Clusters of Time Series,” Journal of Classification, 17, 297–314.
Article MATH MathSciNet Google Scholar
MAHARAJ, E.A. (2002), “Comparison of Non-stationary Time Series in the Frequency Domain,” Computational Statistics and Data Analysis, 40, 131–141.
Article MATH MathSciNet Google Scholar
OTRANTO, E. (2008), “Clustering Heteroskedastic Time Series by Model-based Procedures,” Computational Statistics and Data Analysis, 52, 4685–4698.
Article MATH MathSciNet Google Scholar
PEÑA, D., and PONCELA, P. (2006), “Nonstationary Dynamic Factor Analysis,” Journal of Statistical Planning and Inference, 136, 1237–1257.
Article MATH MathSciNet Google Scholar
PERRON, P. (1988), “Trends and Random Walks in Macroeconomics Time Series,” Dynamics and Control, 12, 297–332.
Article MATH MathSciNet Google Scholar
PICCOLO, D. (1990), “A Distance Measure for Classifying Arima Models,” Journal of Time Series Analysis, 11, 153–164.
Article MATH Google Scholar
SAID, S.E., and DICKEY, D.A. (1984), “Testing for Unit Roots in Autoregressive-moving Average Models of Unknown Order,” Biometrika, 71, 599–607.
Article MATH MathSciNet Google Scholar
SHUMWAY, R.H., and UNGER, A.N. (1974), “Linear Discriminant Functions for Stationary Time Series”, Journal of the American Statistical Association, 69, 948–956.
Article MATH MathSciNet Google Scholar
TONG, H., and DABAS, P. (1990), “Cluster of Time Series Models: An Example,” Journal of Applied Statistics, 17, 187–198.
Article Google Scholar
TONG, H., and YEUNG, I. (1991), “On Tests for Self-exciting Threshold Autoregressive Type Non-linearity in Partially Observed Time Series,” Applied Statistics, 40, 43–62.
Article MATH MathSciNet Google Scholar
VILAR, J.A., ALONSO, A., and VILAR, J.M. (2010), “Non-linear Time Series Clustering Based on Non-parametric Forecast Densities,” Computational Statistics and Data Analysis, 54, 2850–2865.
Article Google Scholar
VILAR, J.A., and PERTEGA, S. (2004), “Discriminant and Cluster Analysis for Gaussian Stationary Processes: Local Linear Fitting Approach,” Journal of Nonparametric Statistics, 16, 443–462.
Article MATH MathSciNet Google Scholar
VILAR, J.M., VILAR, J.A., and PÉRTEGA, S. (2009), “Classifying Time Series Data: A Nonparametric Approach,” Journal of Classification, 26, 3–28.
Article MATH Google Scholar
XIONG, Y., and YEUNG, D.Y. (2002), “Mixtures of ARMA Models forModel-based Time Series Clustering,” in Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 717–720.

Download references

Author information

Authors and Affiliations

Unidad de Epidemiología Clínica y, Bioestadística, Complejo Hospitalario, Universitario de A Coruña, As Xubias 84, Hotel de Pacientes Planta 7, E-15006 A, Coruña, Spain
Sonia Pértega Díaz
Departamento de Matemáticas, Universidade de A Coruña, Campus de Elviña s/n, E-15071 A, Coruña, Spain
José A. Vilar

Authors

Sonia Pértega Díaz
View author publications
You can also search for this author in PubMed Google Scholar
José A. Vilar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonia Pértega Díaz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Díaz, S.P., Vilar, J.A. Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study. J Classif 27, 333–362 (2010). https://doi.org/10.1007/s00357-010-9064-6

Download citation

Published: 21 October 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s00357-010-9064-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation