Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Zhang, Beibei; Chen, Rong

doi:10.1007/s00357-018-9271-0

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Published: 09 October 2018

Volume 35, pages 394–421, (2018)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Beibei Zhang¹ &
Rong Chen²

312 Accesses
Explore all metrics

Abstract

Time series clustering is to assign a set of time series into groups that share certain similarity. It has become an attractive analytic tool as many applications require such classifications. Clustering may also result in more accurate parameter estimates when a group of time series are assumed to share common models and parameters, especially for short panel time series. Many existing time series clustering methods are based on the assumption that the time series are linear. However, linearity assumptions often fail to hold. In this paper we consider the problem of clustering nonlinear time series. We propose the use of a two dimensional Kolmogorov-Smirnov statistic as a distance measure of two time series by measuring the affinity of nonlinear serial dependence structures. It is nonparametric in nature hence no model assumption are needed. The approach is illustrated with simulation studies as well as real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AN, L. (2008), “Dynamic Clustering of Time Series Gene Expression”, Thesis, Purdue University, ProQuest Dissertations Publishing.
ATKINSON, A.B., and BOURGUIGNON, F. (2000), Handbook of Income Distribution, Elvesier.
BATAGELJ, V. (1988), “Generalized Ward and Related Clustering Problems”, in Classification and Related Methods of Data Analysis, ed. H.H. Bock, pp 67–74.
BOHTE, Z., CEPAR, D., and KOSMELJ, K. (1980), “Clustering of Time Series”, in Compstat (Vol. 80), pp 587–593.
BORG, I., and GROENEN, P.J. (2005), Modern Multidimensional Scaling: Theory and Applications, Springer Science and Business Media.
CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-Based Metric for Time Series Classification”, Computational Statistics and Data Analysis, 50(10), 2668–2684.
Article MathSciNet Google Scholar
CONOVER, W. (1999), Practical Nonparametric Statistics, New York: John Wiley and Sons.
Google Scholar
CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric”, Computational Statistics and Data Analysis, 52(4), 1860–1872.
Article MathSciNet Google Scholar
DEFAYS, D. (1977), “An Efficient Algorithm for a Complete Link Method”, Computer Journal, 20(4), 364–366.
Article MathSciNet Google Scholar
DÍAZ, S.P., and VILAR, J.A. (2010), “Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study”, Journal of Classification, 27(3), 333–362.
Article MathSciNet Google Scholar
DIKS, C. (2009), “Nonparametric Tests for Independence”, in Encyclopedia of Complexity and Systems Science, Springer, pp 6252–6271.
DUFOUR, J.M., LEPAGE, Y., and ZEIDAN, H. (1982), “Nonparametric Testing for Time Series: A Bibliography”, Canadian Journal of Statistics, 10(1), 1–38.
Article MathSciNet Google Scholar
D’URSO, P., and MAHARAJ, E.A. (2009), “Autocorrelation-Based Fuzzy Clustering of Time Series”, Fuzzy Sets and Systems, 160(24), 3565–3589.
Article MathSciNet Google Scholar
FAN, J. (2003), Nonlinear Time Series: Nonparametric and Parametric Methods, Springer.
FASANO, G., and FRANCESCHINI, A. (1987), “A Multidimensional Version of the Kolmogorov-Smirnov Test”, Monthly Notices of the Royal Astronomical Society, 225, 155–170.
Article Google Scholar
FRÜHWIRTH-SCHNATTER, S., and KAUFMANN, S. (2008), “Model-Based Clustering of Multiple Time Series”, Journal of Business and Economic Statistics, 26(1), 78–89.
Article MathSciNet Google Scholar
GALEANO, P., and PEÑA, D.P. (2000), “Multivariate Analysis in Vector Time Series”, Resenhas, 4, 383–404.
MathSciNet MATH Google Scholar
GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the Stock Market (Extended Abstract): Which Measure is Best?” in Proceedings of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 487–496.
GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single Linkage Cluster Analysis”, Journal of the Royal Statistical Society, 18(1), 54–64.
MathSciNet Google Scholar
GRANGER, C., MAASOUMI, E., and RACINE, J. (2004), “A Dependence Metric for Possibly Nonlinear Processes”, Journal of Time Series Analysis, 25(5), 649–669.
Article MathSciNet Google Scholar
HARVILL, J.L., RAVISHANKER, N., and RAY, B.K. (2013), “Bispectral-Based Methods for Clustering Time Series”, Computational Statistics and Data Analysis, 64(C), 113–131.
Article MathSciNet Google Scholar
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2009), The Elements of Statistical Learning (2nd ed.), New York: Springer.
Book Google Scholar
KALPAKIS, K., GADA, D., and PUTTAGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-Series”, in Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 273–280.
KAUFMAN, L., and ROUSSEEUW, P.J. (2009), Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons.
KOSMELJ, K., and BATAGELJ, V. (1990), “Cross-Sectional Approach for Clustering Time Varying Data”, Journal of Classification, 7(1), 99–109.
Article MathSciNet Google Scholar
LAFUENTE-REGO, B., and VILAR, J. (2016), “Clustering of Time Series Using Quantile Autocovariances”, Advances in Data Analysis and Classification, 10(3), 391–415.
Article MathSciNet Google Scholar
LANCE, G.N., and WILLIAMS, W.T. (1967), “A General Theory of Classificatory Sorting Strategies. Hierarchical Systems”, The Computer Journal, 9(4), 373–380.
Article Google Scholar
LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey”, Pattern Recognition, 38(11), 1857–1874.
Article Google Scholar
LIU, S., and MAHARAJ, E.A. (2013), “A Hypothesis Test Using Bias-Adjusted ar Estimators for Classifying Time Series in Small Samples”, Computational Statistics and Data Analysis, 60, 32–49.
Article MathSciNet Google Scholar
LOPES, R.H., REID, I., and HOBSON, P.R. (2007), “The Two-Dimensional Kolmogorov-Smirnov Test”, in XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Nikhef, Amsterdam, The Netherlands.
LOPES, R.H., HOBSON, P.R., and REID, I.D. (2008), “Computationally Efficient Algorithms for the Two-Dimensional Kolmogorov-Smirnov Test”, in: Journal of Physics: Conference Series (Vol. 119), IOP Publishing, pp. 2438–2571.
MA, P., and ZHONG, W. (2008), “Penalized Clustering of Large-Scale Functional Data with Multiple Covariates”, Journal of the American Statistical Association, 103(482), 625–636.
Article MathSciNet Google Scholar
MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models”, Journal of Statistical Computation and Simulation, 54(4), 305–331.
Article MathSciNet Google Scholar
MAHARAJ, E.A. (2000), “Cluster of Time Series”, Journal of Classification, 17(2), 297–314.
Article MathSciNet Google Scholar
MANSO, P.M., and VILAR, J. (2013), “TSclust: Time Series Clustering Utilities”, http://CRAN.R-project.org/package=TSclust, R package version 1.1.
MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art”, Computational Statistics Quarterly, 1(2), 1041–1080.
MathSciNet MATH Google Scholar
PEACOCK, J. (1983), “Two-Dimensional Goodness-of-Fit Testing in Astronomy”, Monthly Notices of the Royal Astronomical Society, 202, 615–627.
Article Google Scholar
PERRON, P. (1987), “Testing for a Unit Root in Time Series Regression”, Biometrika, 75(2), 335–346.
MathSciNet MATH Google Scholar
PICCOLO, D. (1990), “A Distance Measure for Classifying ARIMA Models”, Journal of Time Series Analysis, 11(2), 153–164.
Article MathSciNet Google Scholar
TONG, H. (1990), Non-Linear Time Series: A Dynamical System Approach, Oxford University Press.
TONG, H., and YEUNG, I. (1991),“ On Tests for Self-Exciting Threshold Autoregressive-Type Nonlinearity in Partially Observed Time-Series”, Applied Statistics-Journal of the Royal Statistical Society Series C, 40(1), 43–62.
MathSciNet MATH Google Scholar
VILAR, J. (2014), “Tsclust: An R Package for Time Series Clustering”, Journal of Statistical Software, 62(1), 1–43.
Google Scholar
VILAR, J.A., ALONSO, A.M., and VILAR, J M. (2010), “Non-Linear Time Series Clustering Based on Non-Parametric Forecast Densities”, Computational Statistics and Data Analysis, 54(11), 2850–2865.
Article MathSciNet Google Scholar
XIAO, Y. (2017), “A Fast Algorithm for Two-Dimensional Kolmogorov-Smirnov Two Sample Tests”, Computational Statistics and Data Analysis, 105(C), 53–58.
Article MathSciNet Google Scholar
XIONG, Y., and YEUNG, D.Y. (2004), “Time Series Clustering with ARMA Mixtures”, Pattern Recognition, 37(8), 1675–1689.
Article Google Scholar
ZHANG, T. (2013), “Clustering High-Dimensional Time Series Based on Parallelism”, Journal of the American Statistical Association, 108(502), 577–588.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Capital University of Economics and Business, Beijing, China
Beibei Zhang
Department of Statistics and Biostatistics, Rutgers University, New Brunswick, NJ, 08854, USA
Rong Chen

Authors

Beibei Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Rong Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Rong Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, B., Chen, R. Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic. J Classif 35, 394–421 (2018). https://doi.org/10.1007/s00357-018-9271-0

Download citation

Published: 09 October 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00357-018-9271-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering Time Series by Nonlinear Dependence

Clustering of time series using quantile autocovariances

Hierarchical Clustering of Time Series with Wasserstein Distance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering Time Series by Nonlinear Dependence

Clustering of time series using quantile autocovariances

Hierarchical Clustering of Time Series with Wasserstein Distance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now