Skip to main content
Log in

Making clustering in delay-vector space meaningful

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Sequential time series clustering is a technique used to extract important features from time series data. The method can be shown to be the process of clustering in the delay-vector space formalism used in the Dynamical Systems literature. Recently, the startling claim was made that sequential time series clustering is meaningless. This has important consequences for a significant amount of work in the literature, since such a claim invalidates these work’s contribution. In this paper, we show that sequential time series clustering is not meaningless, and that the problem highlighted in these works stem from their use of the Euclidean distance metric as the distance measure in the delay-vector space. As a solution, we consider quite a general class of time series, and propose a regime based on two types of similarity that can exist between delay vectors, giving rise naturally to an alternative distance measure to Euclidean distance in the delay-vector space. We show that, using this alternative distance measure, sequential time series clustering can indeed be meaningful. We repeat a key experiment in the work on which the “meaningless” claim was based, and show that our method leads to a successful clustering outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Babcock B, Datar M, Motwani R, O'Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd symposium on principles of database systems (PODS 2003), San Diego, CA

  2. Berkhin P (2002) Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, http://citeseer.nj.nec.com/berkhin02survey.html

  3. Cotofrei P (2002) Statistical temporal rules. In: Proceedings of the 15th conference on computational statistics, Berlin, Germany

  4. Cotofrei P, Stoffel K (2002) Classification rules + time = temporal rules. In: Proceedings of the 2002 international conference on computational science, Amsterdam

  5. Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proceedings of the 4th international conference on knowledge discovery and data mining, New York, NY

  6. Feng X, Huang H (2005) A fuzzy-set-based reconstructed phase space method for identification of temporal patterns in complex time series. Trans Knowl Data Eng 17(5):601–612

    Article  Google Scholar 

  7. Harms SK, Deogun J, Tadesse T (2002) Discovering sequential association rules with constraints and time lags in multiple sequences. In: Proceedings of the 13th international symposium on methodologies for intelligent systems, Lyon, France

  8. Harms SK, Reichenbach S, Goddard SE, Tadesse T, Waltman WJ (2002) Drought mining in a geospatial decision support system for drought risk management. In: Proceedings of the 1st national conference on digital government, Los Angeles, CA

  9. Hetland ML, Saetrom P (2002) Temporal rules discovery using genetic programming and specialized hardware. In: Proceedings of the 4th international conference on recent advances in soft computing, Nottingham, UK

  10. Jin X, Wang L, Lu Y, Shi C (2002) Indexing and mining of the local patterns in sequence database. In: Proceedings of the 3rd international conference on intelligent data engineering and automated learning, Manchester, UK

  11. Kantz H, Schreiber T (1997) Nonlinear time series analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  12. Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for previous and future research. In: Proceedings of the international conference of data mining, Melbourne, FL, USA, 19–22

  13. Li C, Yu PS, Castelli V (1998) Malm: a framework for mining sequence database at multiple abstraction levels. In: Proceedings of the 7th ACM CIKM international conference on information and knowledge management, Bethesda, MD

  14. Mori T, Uehara K (2001) Extraction of primitive motion and discovery of association rules from human motion. In: Proceedings of the 10th IEEE international workshop on human and robot communication, Bordeaux-Paris, France

  15. Oates T (1999) Identifying distinctive subsequences in multivariate time series by clustering. In: Proceedings of the international conference on knowledge discovery and data mining, San Diego, CA, USA, 1999, pp 322–326

  16. Osaki R, Shimada M, Uehara K A motion recognition mehtod by using primitive motions. In: Arisawa H, Catarici T (eds) Advances in visual information management, visual database systems. Kluwer, 2000, pp 117–127

  17. Radhakrishnan N, Wilson JD, Loizou PC (2000) An alternate partitioning technique to quantify the regularity of complex time series. Int J Birfurcat Chaos 10(7):1773–1779

    Article  Google Scholar 

  18. Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. Trans Knowl Data Eng 14(4):750–767

    Article  Google Scholar 

  19. Schittenkopf C, Tino P, Dorffner G (2000) The benefit of information reduction for trading strategies. Report series for adaptive information systems and management in economics and management science. Report No. 45

  20. Takens F (1981) Detecting strange attractors in turbulence. In: Lecture notes in Mathematics, vol 898. Springer, New York

    Google Scholar 

  21. Tino P, Schittenkopf C, Dorffner G (2000) Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams: lessons learned from financial volatility trading. Report series for adaptive information systems and management in economics and management science. Report No. 45

  22. Yairi T, Kato Y, Hori K (2003) Fault detection by mining association rules in house-keeping data. In: Proceedings of the 6th international symposium on artificial intelligence, robotics and automation in space, Montreal

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason R. Chen.

Additional information

Jason R. Chen received the B.E. degree from Sydney University, Australia, in 1991 and then worked mainly in the banking and finance industry until 1997. From 1997 to 2001, he completed the Ph.D. at Australian National University, Canberra, Australia, in robotics. From 2001 to the present, he has been a Research Engineer in the Research School of Information Science and Engineering, at Australian National University. His research interests broadly include robotics, data mining, and AI.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J.R. Making clustering in delay-vector space meaningful. Knowl Inf Syst 11, 369–385 (2007). https://doi.org/10.1007/s10115-006-0042-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0042-6

Keywords

Navigation