Skip to main content
Log in

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Time series clustering is to assign a set of time series into groups that share certain similarity. It has become an attractive analytic tool as many applications require such classifications. Clustering may also result in more accurate parameter estimates when a group of time series are assumed to share common models and parameters, especially for short panel time series. Many existing time series clustering methods are based on the assumption that the time series are linear. However, linearity assumptions often fail to hold. In this paper we consider the problem of clustering nonlinear time series. We propose the use of a two dimensional Kolmogorov-Smirnov statistic as a distance measure of two time series by measuring the affinity of nonlinear serial dependence structures. It is nonparametric in nature hence no model assumption are needed. The approach is illustrated with simulation studies as well as real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AN, L. (2008), “Dynamic Clustering of Time Series Gene Expression”, Thesis, Purdue University, ProQuest Dissertations Publishing.

  • ATKINSON, A.B., and BOURGUIGNON, F. (2000), Handbook of Income Distribution, Elvesier.

  • BATAGELJ, V. (1988), “Generalized Ward and Related Clustering Problems”, in Classification and Related Methods of Data Analysis, ed. H.H. Bock, pp 67–74.

  • BOHTE, Z., CEPAR, D., and KOSMELJ, K. (1980), “Clustering of Time Series”, in Compstat (Vol. 80), pp 587–593.

  • BORG, I., and GROENEN, P.J. (2005), Modern Multidimensional Scaling: Theory and Applications, Springer Science and Business Media.

  • CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-Based Metric for Time Series Classification”, Computational Statistics and Data Analysis, 50(10), 2668–2684.

    Article  MathSciNet  Google Scholar 

  • CONOVER, W. (1999), Practical Nonparametric Statistics, New York: John Wiley and Sons.

    Google Scholar 

  • CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric”, Computational Statistics and Data Analysis, 52(4), 1860–1872.

    Article  MathSciNet  Google Scholar 

  • DEFAYS, D. (1977), “An Efficient Algorithm for a Complete Link Method”, Computer Journal, 20(4), 364–366.

    Article  MathSciNet  Google Scholar 

  • DÍAZ, S.P., and VILAR, J.A. (2010), “Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study”, Journal of Classification, 27(3), 333–362.

    Article  MathSciNet  Google Scholar 

  • DIKS, C. (2009), “Nonparametric Tests for Independence”, in Encyclopedia of Complexity and Systems Science, Springer, pp 6252–6271.

  • DUFOUR, J.M., LEPAGE, Y., and ZEIDAN, H. (1982), “Nonparametric Testing for Time Series: A Bibliography”, Canadian Journal of Statistics, 10(1), 1–38.

    Article  MathSciNet  Google Scholar 

  • D’URSO, P., and MAHARAJ, E.A. (2009), “Autocorrelation-Based Fuzzy Clustering of Time Series”, Fuzzy Sets and Systems, 160(24), 3565–3589.

    Article  MathSciNet  Google Scholar 

  • FAN, J. (2003), Nonlinear Time Series: Nonparametric and Parametric Methods, Springer.

  • FASANO, G., and FRANCESCHINI, A. (1987), “A Multidimensional Version of the Kolmogorov-Smirnov Test”, Monthly Notices of the Royal Astronomical Society, 225, 155–170.

    Article  Google Scholar 

  • FRÜHWIRTH-SCHNATTER, S., and KAUFMANN, S. (2008), “Model-Based Clustering of Multiple Time Series”, Journal of Business and Economic Statistics, 26(1), 78–89.

    Article  MathSciNet  Google Scholar 

  • GALEANO, P., and PEÑA, D.P. (2000), “Multivariate Analysis in Vector Time Series”, Resenhas, 4, 383–404.

    MathSciNet  MATH  Google Scholar 

  • GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the Stock Market (Extended Abstract): Which Measure is Best?” in Proceedings of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 487–496.

  • GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single Linkage Cluster Analysis”, Journal of the Royal Statistical Society, 18(1), 54–64.

    MathSciNet  Google Scholar 

  • GRANGER, C., MAASOUMI, E., and RACINE, J. (2004), “A Dependence Metric for Possibly Nonlinear Processes”, Journal of Time Series Analysis, 25(5), 649–669.

    Article  MathSciNet  Google Scholar 

  • HARVILL, J.L., RAVISHANKER, N., and RAY, B.K. (2013), “Bispectral-Based Methods for Clustering Time Series”, Computational Statistics and Data Analysis, 64(C), 113–131.

    Article  MathSciNet  Google Scholar 

  • HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2009), The Elements of Statistical Learning (2nd ed.), New York: Springer.

    Book  Google Scholar 

  • KALPAKIS, K., GADA, D., and PUTTAGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-Series”, in Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 273–280.

  • KAUFMAN, L., and ROUSSEEUW, P.J. (2009), Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons.

  • KOSMELJ, K., and BATAGELJ, V. (1990), “Cross-Sectional Approach for Clustering Time Varying Data”, Journal of Classification, 7(1), 99–109.

    Article  MathSciNet  Google Scholar 

  • LAFUENTE-REGO, B., and VILAR, J. (2016), “Clustering of Time Series Using Quantile Autocovariances”, Advances in Data Analysis and Classification, 10(3), 391–415.

    Article  MathSciNet  Google Scholar 

  • LANCE, G.N., and WILLIAMS, W.T. (1967), “A General Theory of Classificatory Sorting Strategies. Hierarchical Systems”, The Computer Journal, 9(4), 373–380.

    Article  Google Scholar 

  • LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey”, Pattern Recognition, 38(11), 1857–1874.

    Article  Google Scholar 

  • LIU, S., and MAHARAJ, E.A. (2013), “A Hypothesis Test Using Bias-Adjusted ar Estimators for Classifying Time Series in Small Samples”, Computational Statistics and Data Analysis, 60, 32–49.

    Article  MathSciNet  Google Scholar 

  • LOPES, R.H., REID, I., and HOBSON, P.R. (2007), “The Two-Dimensional Kolmogorov-Smirnov Test”, in XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Nikhef, Amsterdam, The Netherlands.

  • LOPES, R.H., HOBSON, P.R., and REID, I.D. (2008), “Computationally Efficient Algorithms for the Two-Dimensional Kolmogorov-Smirnov Test”, in: Journal of Physics: Conference Series (Vol. 119), IOP Publishing, pp. 2438–2571.

  • MA, P., and ZHONG, W. (2008), “Penalized Clustering of Large-Scale Functional Data with Multiple Covariates”, Journal of the American Statistical Association, 103(482), 625–636.

    Article  MathSciNet  Google Scholar 

  • MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models”, Journal of Statistical Computation and Simulation, 54(4), 305–331.

    Article  MathSciNet  Google Scholar 

  • MAHARAJ, E.A. (2000), “Cluster of Time Series”, Journal of Classification, 17(2), 297–314.

    Article  MathSciNet  Google Scholar 

  • MANSO, P.M., and VILAR, J. (2013), “TSclust: Time Series Clustering Utilities”, http://CRAN.R-project.org/package=TSclust, R package version 1.1.

  • MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art”, Computational Statistics Quarterly, 1(2), 1041–1080.

    MathSciNet  MATH  Google Scholar 

  • PEACOCK, J. (1983), “Two-Dimensional Goodness-of-Fit Testing in Astronomy”, Monthly Notices of the Royal Astronomical Society, 202, 615–627.

    Article  Google Scholar 

  • PERRON, P. (1987), “Testing for a Unit Root in Time Series Regression”, Biometrika, 75(2), 335–346.

    MathSciNet  MATH  Google Scholar 

  • PICCOLO, D. (1990), “A Distance Measure for Classifying ARIMA Models”, Journal of Time Series Analysis, 11(2), 153–164.

    Article  MathSciNet  Google Scholar 

  • TONG, H. (1990), Non-Linear Time Series: A Dynamical System Approach, Oxford University Press.

  • TONG, H., and YEUNG, I. (1991),“ On Tests for Self-Exciting Threshold Autoregressive-Type Nonlinearity in Partially Observed Time-Series”, Applied Statistics-Journal of the Royal Statistical Society Series C, 40(1), 43–62.

    MathSciNet  MATH  Google Scholar 

  • VILAR, J. (2014), “Tsclust: An R Package for Time Series Clustering”, Journal of Statistical Software, 62(1), 1–43.

    Google Scholar 

  • VILAR, J.A., ALONSO, A.M., and VILAR, J M. (2010), “Non-Linear Time Series Clustering Based on Non-Parametric Forecast Densities”, Computational Statistics and Data Analysis, 54(11), 2850–2865.

    Article  MathSciNet  Google Scholar 

  • XIAO, Y. (2017), “A Fast Algorithm for Two-Dimensional Kolmogorov-Smirnov Two Sample Tests”, Computational Statistics and Data Analysis, 105(C), 53–58.

    Article  MathSciNet  Google Scholar 

  • XIONG, Y., and YEUNG, D.Y. (2004), “Time Series Clustering with ARMA Mixtures”, Pattern Recognition, 37(8), 1675–1689.

    Article  Google Scholar 

  • ZHANG, T. (2013), “Clustering High-Dimensional Time Series Based on Parallelism”, Journal of the American Statistical Association, 108(502), 577–588.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Chen, R. Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic. J Classif 35, 394–421 (2018). https://doi.org/10.1007/s00357-018-9271-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-018-9271-0

Keywords

Navigation