Abstract
Time series clustering is to assign a set of time series into groups that share certain similarity. It has become an attractive analytic tool as many applications require such classifications. Clustering may also result in more accurate parameter estimates when a group of time series are assumed to share common models and parameters, especially for short panel time series. Many existing time series clustering methods are based on the assumption that the time series are linear. However, linearity assumptions often fail to hold. In this paper we consider the problem of clustering nonlinear time series. We propose the use of a two dimensional Kolmogorov-Smirnov statistic as a distance measure of two time series by measuring the affinity of nonlinear serial dependence structures. It is nonparametric in nature hence no model assumption are needed. The approach is illustrated with simulation studies as well as real data examples.
Similar content being viewed by others
References
AN, L. (2008), “Dynamic Clustering of Time Series Gene Expression”, Thesis, Purdue University, ProQuest Dissertations Publishing.
ATKINSON, A.B., and BOURGUIGNON, F. (2000), Handbook of Income Distribution, Elvesier.
BATAGELJ, V. (1988), “Generalized Ward and Related Clustering Problems”, in Classification and Related Methods of Data Analysis, ed. H.H. Bock, pp 67–74.
BOHTE, Z., CEPAR, D., and KOSMELJ, K. (1980), “Clustering of Time Series”, in Compstat (Vol. 80), pp 587–593.
BORG, I., and GROENEN, P.J. (2005), Modern Multidimensional Scaling: Theory and Applications, Springer Science and Business Media.
CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-Based Metric for Time Series Classification”, Computational Statistics and Data Analysis, 50(10), 2668–2684.
CONOVER, W. (1999), Practical Nonparametric Statistics, New York: John Wiley and Sons.
CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric”, Computational Statistics and Data Analysis, 52(4), 1860–1872.
DEFAYS, D. (1977), “An Efficient Algorithm for a Complete Link Method”, Computer Journal, 20(4), 364–366.
DÍAZ, S.P., and VILAR, J.A. (2010), “Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study”, Journal of Classification, 27(3), 333–362.
DIKS, C. (2009), “Nonparametric Tests for Independence”, in Encyclopedia of Complexity and Systems Science, Springer, pp 6252–6271.
DUFOUR, J.M., LEPAGE, Y., and ZEIDAN, H. (1982), “Nonparametric Testing for Time Series: A Bibliography”, Canadian Journal of Statistics, 10(1), 1–38.
D’URSO, P., and MAHARAJ, E.A. (2009), “Autocorrelation-Based Fuzzy Clustering of Time Series”, Fuzzy Sets and Systems, 160(24), 3565–3589.
FAN, J. (2003), Nonlinear Time Series: Nonparametric and Parametric Methods, Springer.
FASANO, G., and FRANCESCHINI, A. (1987), “A Multidimensional Version of the Kolmogorov-Smirnov Test”, Monthly Notices of the Royal Astronomical Society, 225, 155–170.
FRÜHWIRTH-SCHNATTER, S., and KAUFMANN, S. (2008), “Model-Based Clustering of Multiple Time Series”, Journal of Business and Economic Statistics, 26(1), 78–89.
GALEANO, P., and PEÑA, D.P. (2000), “Multivariate Analysis in Vector Time Series”, Resenhas, 4, 383–404.
GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the Stock Market (Extended Abstract): Which Measure is Best?” in Proceedings of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 487–496.
GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single Linkage Cluster Analysis”, Journal of the Royal Statistical Society, 18(1), 54–64.
GRANGER, C., MAASOUMI, E., and RACINE, J. (2004), “A Dependence Metric for Possibly Nonlinear Processes”, Journal of Time Series Analysis, 25(5), 649–669.
HARVILL, J.L., RAVISHANKER, N., and RAY, B.K. (2013), “Bispectral-Based Methods for Clustering Time Series”, Computational Statistics and Data Analysis, 64(C), 113–131.
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2009), The Elements of Statistical Learning (2nd ed.), New York: Springer.
KALPAKIS, K., GADA, D., and PUTTAGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-Series”, in Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 273–280.
KAUFMAN, L., and ROUSSEEUW, P.J. (2009), Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons.
KOSMELJ, K., and BATAGELJ, V. (1990), “Cross-Sectional Approach for Clustering Time Varying Data”, Journal of Classification, 7(1), 99–109.
LAFUENTE-REGO, B., and VILAR, J. (2016), “Clustering of Time Series Using Quantile Autocovariances”, Advances in Data Analysis and Classification, 10(3), 391–415.
LANCE, G.N., and WILLIAMS, W.T. (1967), “A General Theory of Classificatory Sorting Strategies. Hierarchical Systems”, The Computer Journal, 9(4), 373–380.
LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey”, Pattern Recognition, 38(11), 1857–1874.
LIU, S., and MAHARAJ, E.A. (2013), “A Hypothesis Test Using Bias-Adjusted ar Estimators for Classifying Time Series in Small Samples”, Computational Statistics and Data Analysis, 60, 32–49.
LOPES, R.H., REID, I., and HOBSON, P.R. (2007), “The Two-Dimensional Kolmogorov-Smirnov Test”, in XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Nikhef, Amsterdam, The Netherlands.
LOPES, R.H., HOBSON, P.R., and REID, I.D. (2008), “Computationally Efficient Algorithms for the Two-Dimensional Kolmogorov-Smirnov Test”, in: Journal of Physics: Conference Series (Vol. 119), IOP Publishing, pp. 2438–2571.
MA, P., and ZHONG, W. (2008), “Penalized Clustering of Large-Scale Functional Data with Multiple Covariates”, Journal of the American Statistical Association, 103(482), 625–636.
MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models”, Journal of Statistical Computation and Simulation, 54(4), 305–331.
MAHARAJ, E.A. (2000), “Cluster of Time Series”, Journal of Classification, 17(2), 297–314.
MANSO, P.M., and VILAR, J. (2013), “TSclust: Time Series Clustering Utilities”, http://CRAN.R-project.org/package=TSclust, R package version 1.1.
MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art”, Computational Statistics Quarterly, 1(2), 1041–1080.
PEACOCK, J. (1983), “Two-Dimensional Goodness-of-Fit Testing in Astronomy”, Monthly Notices of the Royal Astronomical Society, 202, 615–627.
PERRON, P. (1987), “Testing for a Unit Root in Time Series Regression”, Biometrika, 75(2), 335–346.
PICCOLO, D. (1990), “A Distance Measure for Classifying ARIMA Models”, Journal of Time Series Analysis, 11(2), 153–164.
TONG, H. (1990), Non-Linear Time Series: A Dynamical System Approach, Oxford University Press.
TONG, H., and YEUNG, I. (1991),“ On Tests for Self-Exciting Threshold Autoregressive-Type Nonlinearity in Partially Observed Time-Series”, Applied Statistics-Journal of the Royal Statistical Society Series C, 40(1), 43–62.
VILAR, J. (2014), “Tsclust: An R Package for Time Series Clustering”, Journal of Statistical Software, 62(1), 1–43.
VILAR, J.A., ALONSO, A.M., and VILAR, J M. (2010), “Non-Linear Time Series Clustering Based on Non-Parametric Forecast Densities”, Computational Statistics and Data Analysis, 54(11), 2850–2865.
XIAO, Y. (2017), “A Fast Algorithm for Two-Dimensional Kolmogorov-Smirnov Two Sample Tests”, Computational Statistics and Data Analysis, 105(C), 53–58.
XIONG, Y., and YEUNG, D.Y. (2004), “Time Series Clustering with ARMA Mixtures”, Pattern Recognition, 37(8), 1675–1689.
ZHANG, T. (2013), “Clustering High-Dimensional Time Series Based on Parallelism”, Journal of the American Statistical Association, 108(502), 577–588.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, B., Chen, R. Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic. J Classif 35, 394–421 (2018). https://doi.org/10.1007/s00357-018-9271-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9271-0