Abstract
One key point in cluster analysis is to determine a similarity or dissimilarity measure between data objects. When working with time series, the concept of similarity can be established in different ways. In this paper, several non-parametric statistics originally designed to test the equality of the log-spectra of two stochastic processes are proposed as dissimilarity measures between time series data. Their behavior in time series clustering is analyzed throughout a simulation study, and compared with the performance of several model-free and model-based dissimilarity measures. Up to three different classification settings were considered: (i) to distinguish between stationary and non-stationary time series, (ii) to classify different ARMA processes and (iii) to classify several non-linear time series models. As it was expected, the performance of a particular dissimilarity metric strongly depended on the type of processes subjected to clustering. Among all the measures studied, the nonparametric distances showed the most robust behavior.
Similar content being viewed by others
References
ALONSO, A., BERRENDERO, J., HERNANDEZ, A., and JUSTEL, A. (2006), “Time Series Clustering Based on Forecast Densities,” Computational Statistics and Data Analysis, 51, 762–776.
BARAGONA, R. (2001), “A Simulation Study on Clustering Time Series with Metaheuristic Methods,” Quaderni di Statistica, 3, 1–26.
BOETS, J., DE COCK, K., ESPINOZA, M., and DE MOOR, B. (2005), “Clustering Time Series, Subspace Identification and Cepstral Distances,” Communications in Information and Systems, 5, 69–96.
BOHTE, Z., CEPAR, D., and KOS̆MELJ, K. (1980), “Clustering of Time Series,” Proceedings of COMPSTAT80, 587–593.
CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-based Metric for Time Series Classification,” Computational Statistics and Data Analysis, 50, 2668–2684.
CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric,” Computational Statistics and Data Analysis, 52, 1860–1872.
D’URSO, P., and MAHARAJ, E.A. (2009), “Autocorrelation-based Fuzzy Clustering of Time Series,” Fuzzy Sets and Systems, 160, 3565–3589.
FAN, J., and GIJBELS, I. (1996), “Local Polynomial Modelling and Its Applications”, London: Chapman and Hall.
FAN, J., and KREUTZBERGER, E. (1998), “Automatic Local Smoothing for Spectral Density Estimation,” Scandinavian Journal of Statistics, 25, 359–369.
FAN, J., and ZHANG, W. (2004), “Generalised Likelihood Ratio Tests for Spectral Density,” Biometrika, 91, 195–209.
GALBRAITH, J., and JIAQING, L. (1999), “Cluster and Discriminant Analysis on Time Series as a Research Tool,” UTIP Working Paper Number 6, The University of Texas at Austin, Austin: Lyndon B.
GALEANO, P., and PEÑA, D. (2000), “Multivariate Analysis in Vector Time Series,” Resenhas, 4, 383–403.
GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the StockMarket: WhichMeasure is Best?,” in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD), pp 487–496.
HIRSCH, B., and DUBOIS, D. (1991), “Self-esteem in Early Adolescence: The Identification and Prediction of Contrasting Longitudinal Trajectories,” Journal of Youth and Adolescence, 20, 53–72.
KAKIZAWA, Y., SHUMWAY, R.H., and TANIGUCHI, M. (1998), “Discrimination and Clustering for Multivariate Time Series,” Journal of the American Statistical Association, 93, 328–340.
KALPAKIS, G.D., GADA, K., and PUTTAGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-series,” in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280.
KOS̆MELJ, K. (1986). “A Two-step Procedure for Clustering Time Varying Data,” Journal of Mathematical Sociology, 12, 315–326.
LI, C., BISWAS, G., DALE, M., and DALE, P. (2001), “Building Models of Ecological Dynamics Using HMM Based Temporal Data Clustering–A Preliminary Study,”, in Advances in Intelligent Data Analysis, the Fourth International Conference on Intelligent Data Analysis, Lecture Notes in Computer Science Series (Vol 2189), Springer, pp. 53–62.
LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey,” Pattern Recognition, 38, 1857–1874.
MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models,” Journal of Statistical Computation and Simulation, 54, 305–331.
MAHARAJ, E.A. (2000), “Clusters of Time Series,” Journal of Classification, 17, 297–314.
MAHARAJ, E.A. (2002), “Comparison of Non-stationary Time Series in the Frequency Domain,” Computational Statistics and Data Analysis, 40, 131–141.
OTRANTO, E. (2008), “Clustering Heteroskedastic Time Series by Model-based Procedures,” Computational Statistics and Data Analysis, 52, 4685–4698.
PEÑA, D., and PONCELA, P. (2006), “Nonstationary Dynamic Factor Analysis,” Journal of Statistical Planning and Inference, 136, 1237–1257.
PERRON, P. (1988), “Trends and Random Walks in Macroeconomics Time Series,” Dynamics and Control, 12, 297–332.
PICCOLO, D. (1990), “A Distance Measure for Classifying Arima Models,” Journal of Time Series Analysis, 11, 153–164.
SAID, S.E., and DICKEY, D.A. (1984), “Testing for Unit Roots in Autoregressive-moving Average Models of Unknown Order,” Biometrika, 71, 599–607.
SHUMWAY, R.H., and UNGER, A.N. (1974), “Linear Discriminant Functions for Stationary Time Series”, Journal of the American Statistical Association, 69, 948–956.
TONG, H., and DABAS, P. (1990), “Cluster of Time Series Models: An Example,” Journal of Applied Statistics, 17, 187–198.
TONG, H., and YEUNG, I. (1991), “On Tests for Self-exciting Threshold Autoregressive Type Non-linearity in Partially Observed Time Series,” Applied Statistics, 40, 43–62.
VILAR, J.A., ALONSO, A., and VILAR, J.M. (2010), “Non-linear Time Series Clustering Based on Non-parametric Forecast Densities,” Computational Statistics and Data Analysis, 54, 2850–2865.
VILAR, J.A., and PERTEGA, S. (2004), “Discriminant and Cluster Analysis for Gaussian Stationary Processes: Local Linear Fitting Approach,” Journal of Nonparametric Statistics, 16, 443–462.
VILAR, J.M., VILAR, J.A., and PÉRTEGA, S. (2009), “Classifying Time Series Data: A Nonparametric Approach,” Journal of Classification, 26, 3–28.
XIONG, Y., and YEUNG, D.Y. (2002), “Mixtures of ARMA Models forModel-based Time Series Clustering,” in Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 717–720.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Díaz, S.P., Vilar, J.A. Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study. J Classif 27, 333–362 (2010). https://doi.org/10.1007/s00357-010-9064-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-010-9064-6