TS-stream: clustering time series on data streams

Pereira, Cássio M. M.; de Mello, Rodrigo F.

doi:10.1007/s10844-013-0290-3

TS-stream: clustering time series on data streams

Published: 01 December 2013

Volume 42, pages 531–566, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Cássio M. M. Pereira¹ &
Rodrigo F. de Mello¹

886 Accesses
12 Citations
Explore all metrics

Abstract

The current ability to produce massive amounts of data and the impossibility in storing it motivated the development of data stream mining strategies. Despite the proposal of many techniques, this research area still lacks in approaches to mine data streams composed of multiple time series, which has applications in finance, medicine and science. Most of the current techniques for clustering streaming time series have a serious limitation in their similarity measure, which are based on the Pearson correlation. In this paper, we show the Pearson correlation is not capable of detecting similarities even for classic time series models, such as those by Box and Jenkins. This limitation motivated our proposal to cluster streaming time series based on their generating functions, which is achieved by considering features obtained using descriptive measures, such as Auto Mutual Information, the Hurst Exponent and several others. We present a new tree-based clustering algorithm, entitled TS-Stream, which uses the extracted features to produce partitions in better accordance to the time series generating functions. Experiments with synthetic data sets confirm TS-Stream outperforms ODAC, currently the most popular technique, in terms of clustering quality. Using real financial time series from the NYSE and NASDAQ, we conducted stock trading simulations employing TS-Stream to support the creation of diversified investment portfolios. Results confirmed TS-Stream increased the monetary returns in several orders of magnitude when compared to trading strategies simply based on the Moving Average Convergence Divergence financial indicator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison of Multivariate Time Series Clustering Methods

Financial Time Series Clustering

Time series clustering in linear time complexity

Article 18 September 2021

Xiaosheng Li, Jessica Lin & Liang Zhao

Notes

Determinism is a measure based on Recurrence Quantification Analysis (Marwan et al. 2007).
Defined as \(\sigma ^{2}(X)~=~\sum _{i=1}^{n} p_{i}~\cdot ~(x_{i} - \mu )^{2}\) for a series X = (x ₁, x ₂, … , x _n), in which p _i is the probability of observation x _i and μ is the mean of the sequence.
A deterministic, chaotic, nonlinear model.
In this case outliers are simply exceptional gains, but when plotted hinder a better visualization of the distribution.

References

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S. (2003). A framework for clustering evolving data streams. In: VLDB ’2003: Proceedings of the 29th international conference on very large data bases (pp. 81–92). VLDB Endowment.
Aggarwal, C.C., Han, J.,Wang, J., Yu, P.S. (2004). A framework for projected clustering of high dimensional data streams. In VLDB ’04: Proceedings of the 30th international conference on very large data bases (pp. 852–863). VLDB Endowment.
Ahmed, N., Natarajan, T., Rao, K.R. (1974). Discrete cosine transfom. IEEE Transactions on Computers, 23, 90–93.
Article MATH MathSciNet Google Scholar
Appel, G. (2005). Technical analysis: power tools for active investors, 1st edn, FT Press.
Ardia, D., Boudt, K., Carl, P., Mullen, K.M., Peterson, B.G. (2011). Differential Evolution with DEoptim: An application to non-convex portfolio optimization. The Royal Journal, 3(1), 27–34.
Google Scholar
Athanassioum, P. (2012). Research handbook on hedge funds, private equity and alternative investments, Edward Elgar Pub.
Bélisle, C. (1992). Convergence theorems for a class of simulated annealing algorithms on rd. Journal of Applied Probability, 885–895.
Beringer, J., & Hüllermeier, E. (2006). Online clustering of parallel data streams. Data Knowledge Engineering, 58, 180–204. doi:10.1016/j.datak.2005.05.009.
Article Google Scholar
Bifet, A., & Kirby, R. (2009). Data stream mining: A practical approach. Technical report, The University of Waikato.
Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. The journal of political economy, 637–654.
Box, G., & Jenkins, G. (1994). Time series analysis: forecasting and control, Prentice Hall PTR.
Cao, F. (2006). Density-based clustering over an evolving data stream with noise. In Proceedings of the 6th SIAM international conference data mining.
Chaovalit, P. (2009). Clustering transient data streams by example and by variable. PhD thesis, University of Maryland.
Chatfield, C. (2003). The analysis of time series: an introduction (Vol. 59). CRC press.
Daubechies, I. (1992). Ten lectures on wavelets. Society for industrial and applied mathematics, Philadelphia.
Díaz, S.P., & Vilar, J.A. (2010). Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. Journal of Classification, 27, 333–362. doi:10.1007/s00357-010-9064-6.
Article MathSciNet Google Scholar
D’haeseleer, P., et al. (2005). How does gene expression clustering work? Nature biotechnology, 23(12), 1499–1502.
Article Google Scholar
Fourier, J. (1888). Théorie analytique de la chaleur (Vol. 1). Gauthier-Villars et fils.
Fu, T.C. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1), 164–181. doi:10.1016/j.engappai.2010.09.007, http://www.sciencedirect.com/science/article/B6V2M-516KF3X-1/2/f93f19227049b30e34b3de788e9e2b7f.
Fujita, A., Sato, J., Demasi, M., Sogayar, M., Ferreira, C., Miyano, S. (2009). Comparing pearson, spearman and hoeffding’s d measure for gene expression association analysis. Journal of Bioinformatics and Computational Biology, 7(4), 663–84.
Article Google Scholar
Gama, J. (2010). Knowledge discovery from data streams, 1st edn. Chapman & Hall/CRC.
Hilbert, D. (1912). Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. BG Teubner.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 13–30.
Huang, N., Shen, Z., Long, S., Wu, M., Shih, H., Zheng, Q., Yen, N., Tung, C., Liu, H. (1998). The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 454(1971), 903.
Article MATH MathSciNet Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218. doi:10.1007/BF01908075.
Article Google Scholar
Ishii, R.P., Rios, R.A., de Mello, R.F. (2011). Classification of time series generation processes using experimental tools: a survey and proposal of an automatic and systematic approach. International Journal of Computational Science and Engineering, 1, 1–21.
Google Scholar
Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis. New York: Cambridge University Press.
Keogh, E., Lin, J., Truppel, W. (2003). Clustering of time series subsequences is meaningless: implications for previous and future research. In ICDM ’03: Proceedings of the 3rd IEEE international conference on data mining, IEEE computer society (pp. 115–). Washington, DC. http://dl.acm.org/citation.cfm?id=951949.952156.
Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y. (2008). Continuous subspace clustering in streaming time series. Information Systems, 33(2), 240–260.
Article Google Scholar
Kranen, P., Assent, I., Baldauf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272. doi:10.1007/s10115-010-0342-8.
Article Google Scholar
Lorenz, E.N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20(2), 130–141.
Article Google Scholar
MacKay, D. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.
MATH Google Scholar
Marwan, N., Romano, M.C., Thiel, M., Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics Reports, 438(5-6), 237–329. doi:10.1016/j.physrep.2006.11.001.
Article MathSciNet Google Scholar
Peng, C., Buldyrev, S., Havlin, S., Simons, M., Stanley, H., Goldberger, A. (1994). On the mosaic organization of DNA sequences. Physical Review E, 49, 1685–1689.
Article Google Scholar
Pompe, B. (1993). Measuring statistical dependences in a time series. Journal of Statistical Physics, 73(3), 587–610.
Article MATH MathSciNet Google Scholar
Quinlan, J. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.
Google Scholar
Quinlan, J. (1993). C4. 5: programs for machine learning, Morgan Kaufmann.
R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org, ISBN 3-900051-07-0.
Ren, J., Cai, B., Hu, C. (2011). Clustering over data streams based on grid density and index tree. Journal of Convergence Information Technology, 6(1).
Rodrigues, P.P., Gama, J., Pedroso, J. (2008). Hierarchical clustering of time-series data streams. IEEE Transactions on Knowledge and Data Engineering, 20, 615–627. doi:10.1109/TKDE.2007.190727.
Article Google Scholar
Sandri, M. (1996). Numerical calculation of lyapunov exponents. The Mathematical Journal, 6(3), 78–84.
Google Scholar
Skiena, S.S. (1998). The algorithm design manual. New York: Springer.
Google Scholar
Takens, F. (1981). Detecting strange attractors in turbulence. Dynamical systems and turbulence, (pp. 366–381). Warwick 1980.
Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to Data Mining. Boston: Addison-Wesley Longman.
Tang, L.A., Zheng, Y., Yuan, J., Han, J., Leung, A., Hung, C.C., Peng, W.C. (2012). On discovery of traveling companions from streaming trajectories. In 2012 IEEE 28th International Conference on Data Engineering (ICDE), (pp. 186-197). IEEE.
Vinh, N.X., Epps, J., Bailey, J. (2009). Information theoretic measures for clusterings comparison: is a correction for chance necessary? In ICML ’09 (pp 1073–1080). New York: ACM. doi:10.1145/1553374.1553511.
Walpole, R., Myers, R., Myers, S., Ye, K. (1998). Probability and statistics for engineers and scientists. Upper Saddle River: Prentice Hall.
Google Scholar
Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K. (2009). Density-based clustering of data streams at multiple resolutions. ACM Transactions on Knowledge and Discovery Data, 3(3), 1–28. doi:10.1145/1552303.1552307.
Article Google Scholar
Whitney, H. (1936). Differentiable manifolds. Annals of Mathematics, 37(3), 645–680.
Article MathSciNet Google Scholar
Widiputra, H., Pears, R., Kasabov, N. (2011). Multiple time-series prediction through multiple time-series relationships profiling and clustered recurring trends. In Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining (pp. 161–172). Berlin, Heidelberg: Springer-Verlag.
Chapter Google Scholar
Yang, Y., & Chen, K. (2011). Temporal data clustering via weighted clustering ensemble with different representations. IEEE Transactions on Knowledge and Data Engineering, 23, 307–320.
Article Google Scholar
Zheng, K., Zheng, Y., Yuan, N.J., Shang, S. (2013). On discovery of gathering patterns from trajectories. In IEEE international conference on data engineering, ICDE.

Download references

Acknowledgments

This paper is based upon work supported by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo)—Brazil under grant #2010/05062-6. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of FAPESP. We also thank the anonymous reviewers who suggested improvements to the presentation of this paper.

Author information

Authors and Affiliations

Institute of Mathematical and Computer Sciences–ICMC–USP, University of Sao Paulo, Av. Trabalhador são-carlense, São Carlos, 400 13566-590, SP, Brazil
Cássio M. M. Pereira & Rodrigo F. de Mello

Authors

Cássio M. M. Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo F. de Mello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cássio M. M. Pereira.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pereira, C.M.M., de Mello, R.F. TS-stream: clustering time series on data streams. J Intell Inf Syst 42, 531–566 (2014). https://doi.org/10.1007/s10844-013-0290-3

Download citation

Received: 03 July 2013
Revised: 13 November 2013
Accepted: 15 November 2013
Published: 01 December 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10844-013-0290-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TS-stream: clustering time series on data streams

Abstract

Access this article

Similar content being viewed by others

A Comparison of Multivariate Time Series Clustering Methods

Financial Time Series Clustering

Time series clustering in linear time complexity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TS-stream: clustering time series on data streams

Abstract

Access this article

Similar content being viewed by others

A Comparison of Multivariate Time Series Clustering Methods

Financial Time Series Clustering

Time series clustering in linear time complexity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation