Skip to main content

Data Streams Are Time Series: Challenging Assumptions

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2020)

Abstract

The increasingly relevance of data streams in the context of machine learning and artificial intelligence has motivated this paper which discusses and draws necessary relationships between the concepts of data streams and time series in attempt to build on theoretical foundations to support online learning in such scenarios. We unify the concepts of data streams and time series by assessing their definitions in the literature and discuss the major implications of this claim on the way that data streams research and practice is carried out, showing that many common assumptions are incorrect or unnecessary. We analyzed six data sources typically used in benchmark data-stream classification and found that none of those meet the requirements and assumptions qualifying them for online learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ITS provides federal departments and agencies comprehensive definitions of terms used in telecommunications and directly related fields by the U.S. Government and internationally; see https://www.its.bldrdoc.gov/fs-1037/dir-010/_1451.htm.

  2. 2.

    This dataset and corresponding sources are available at https://moa.cms.waikato.ac.nz/datasets.

  3. 3.

    Again, see https://moa.cms.waikato.ac.nz/~datasets/.

References

  1. Tennant, M., Stahl, F., Rana, O., Gomes, J.B.: Scalable real-time classification of data streams with concept drift. Fut. Gener. Comput. Syst. 75, 187–199 (2017). https://doi.org/10.1016/j.future.2017.03.026

    Article  Google Scholar 

  2. Aaij, R.: Tesla: an application for real-time data analysis in high energy physics. Comput. Phys. Commun. 208, 35–42 (2016). https://doi.org/10.1016/j.cpc.2016.07.022

    Article  Google Scholar 

  3. Aggarwal, C.C.: Data Streams: Models and Algorithms. Advances in Database Systems, vol. 31. Springer, Heidelberg (2006). https://doi.org/10.1007/978-0-387-47534-9

    Book  MATH  Google Scholar 

  4. Aggarwal, C.C.: A survey of stream classification algorithms. In: Data Classification: Algorithms and Applications (2014)

    Google Scholar 

  5. Al-Khateeb, T.: Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28(10), 2752–2764 (2016). https://doi.org/10.1109/TKDE.2015.2507123

    Article  Google Scholar 

  6. Alligood, K., Sauer, T., Yorke, J.: Chaos: An Introduction to Dynamical Systems. Textbooks in Mathematical Sciences. Springer, New York (2000). https://doi.org/10.1007/b97589

    Book  MATH  Google Scholar 

  7. Bélair, J., Glass, L., der Heiden, U., Milton, J.: Dynamical disease: identification, temporal aspects and treatment strategies of human illness. Chaos Interdisc. J. Nonlinear Sci. 5(1), 1–7 (1995)

    Article  Google Scholar 

  8. Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Frontiers in Artificial Intelligence and Applications. IOS Press, Amsterdam (2010)

    MATH  Google Scholar 

  9. Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Upper Saddle River (1994)

    MATH  Google Scholar 

  10. de Carvalho Pagliosa, L., de Mello, R.F.: Applying a kernel function on time-dependent data to provide supervised-learning guarantees. Exp. Syst. Appl. 71, 216–229 (2017). https://doi.org/10.1016/j.eswa.2016.11.028

    Article  Google Scholar 

  11. Chen, G., Fang, X., Fan, H.: Estimating hourly water temperatures in rivers using modified sine and sinusoidal wave functions. J. Hydrol. Eng. 21(10), 05016023 (2016)

    Article  Google Scholar 

  12. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Stochastic Modelling and Applied Probability. Springer, New York (1997). https://doi.org/10.1007/978-1-4612-0711-5

    Book  MATH  Google Scholar 

  13. Diggle, P., et al.: Analysis of Longitudinal Data. Oxford Statistical Science Series. Oxford University Press, Oxford (2002)

    Google Scholar 

  14. Frees, E.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  15. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 39–59. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-47534-9_3

    Chapter  MATH  Google Scholar 

  16. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 441–4437 (2014). https://doi.org/10.1145/2523813

    Article  MATH  Google Scholar 

  17. Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)

    Book  MATH  Google Scholar 

  18. Gorman, M., Widmann, P., Robbins, K.: Nonlinear dynamics of a convection loop: a quantitative comparison of experiment with theory. Physica D 19(2), 255–267 (1986). https://doi.org/10.1016/0167-2789(86)90022-9

    Article  MATH  Google Scholar 

  19. Hegger, R., Kantz, H., Schreiber, T.: Practical implementation of nonlinear time series methods: the TISEAN package. Chaos Interdiscip. J. Nonlinear Sci. 9(2), 413–435 (1999)

    Article  MATH  Google Scholar 

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001). https://doi.org/10.1145/502512.502529

  22. Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, 2nd edn. Cambridge University Press, Cambridge (2003). https://doi.org/10.1017/CBO9780511755798

    Book  MATH  Google Scholar 

  23. Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 367–371. Citeseer (1999)

    Google Scholar 

  24. Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001, pp. 289–296. IEEE Computer Society, Washington, DC, USA (2001)

    Google Scholar 

  25. Knobloch, E.: Chaos in the segmented disc dynamo. Phys. Lett. A 82(9), 439–440 (1981)

    Article  MathSciNet  Google Scholar 

  26. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017). https://doi.org/10.1016/j.inffus.2017.02.004

    Article  Google Scholar 

  27. Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16, 1–10 (2014). https://doi.org/10.1145/2674026.2674028

    Article  Google Scholar 

  28. Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963). https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

    Article  MathSciNet  MATH  Google Scholar 

  29. von Luxburg, U., Schölkopf, B.: Statistical Learning Theory: Models, Concepts, and Results, vol. 10, pp. 651–706. Elsevier North Holland, Amsterdam (2011)

    MATH  Google Scholar 

  30. Masud, M.M.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2012). https://doi.org/10.1007/s10115-011-0447-8

    Article  Google Scholar 

  31. McGregor, A., Pavan, A., Tirthapura, S., Woodruff, D.P.: Space-efficient estimation of statistics over sub-sampled streams. Algorithmica 74(2), 787–811 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  32. Fernandes de Mello, R., Antonelli Ponti, M.: Machine Learning - A Practical Approach on the Statistical Learning Theory. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94989-5

    Book  MATH  Google Scholar 

  33. de Mello, R.F., Vaz, Y., Ferreira, C.H.G., Bifet, A.: On learning guarantees to unsupervised concept drift detection on data streams. Exp. Syst. Appl. 117, 90–102 (2019). https://doi.org/10.1016/j.eswa.2018.08.054

    Article  Google Scholar 

  34. Poland, D.: Cooperative catalysis and chemical chaos: a chemical model for the lorenz equations. Physica D 65(1), 86–99 (1993). https://doi.org/10.1016/0167-2789(93)90006-M

    Article  MATH  Google Scholar 

  35. Puthal, D.: Lattice-modeled information flow control of big sensing data streams for smart health application. IEEE Internet Things J. 6(2), 1312–1320 (2019). https://doi.org/10.1109/JIOT.2018.2805896

    Article  Google Scholar 

  36. Rajaraman, A., Leskovec, J., Ullman, J.D.: Mining Massive Datasets (2014). http://infolab.stanford.edu/~ullman/mmds/book.pdf

  37. Hsiao, C.: Analysis of Panel Data, 2nd edn, p. 382. Cambridge University Press, Cambridge (2003). ISBN: 0-521-81855-9, [uk pound]21.95. Int. J. Forecast. 20(1), 142–143 (2004)

    Book  Google Scholar 

  38. Richards, N.M., King, J.H.: Three paradoxes of big data. Stan. L. Rev. Online 66, 41 (2013)

    Google Scholar 

  39. Rios, R.A., de Mello, R.F.: Applying empirical mode decomposition and mutual information to separate stochastic and deterministic influences embedded in signals. Sig. Process. 118, 159–176 (2016). https://doi.org/10.1016/j.sigpro.2015.07.003

    Article  Google Scholar 

  40. Rios, R.A., Pagliosa, P.A., Ishii, R.P., de Mello, R.F.: TSViz: a data stream architecture to online collect, analyze, and visualize tweets. In: Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, 3–7 April 2017, pp. 1031–1036 (2017). https://doi.org/10.1145/3019612.3019811

  41. Roseberry, M., Cano, A.: Multi-label kNN classifier with self adjusting memory for drifting data streams. In: Torgo, L., Matwin, S., Japkowicz, N., Krawczyk, B., Moniz, N., Branco, P. (eds.) Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR. Proceedings of Machine Learning Research, ECML-PKDD, Dublin, Ireland, 10 September 2018, vol. 94, pp. 23–37 (2018)

    Google Scholar 

  42. Rössler, O.: An equation for continuous chaos. Phys. Lett. A 57(5), 397–398 (1976). https://doi.org/10.1016/0375-9601(76)90101-8

    Article  MATH  Google Scholar 

  43. Serrà, J., Gómez, E., Herrera, P.: Audio cover song identification and similarity: background, approaches, evaluation, and beyond. In: Raś, Z.W., Wieczorkowska, A.A. (eds.) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol. 274. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11674-2_14

    Chapter  Google Scholar 

  44. Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics, 2nd edn. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-319-52452-8

    Book  MATH  Google Scholar 

  45. Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 131–1331 (2013). https://doi.org/10.1145/2522968.2522981

    Article  MATH  Google Scholar 

  46. Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.-S. (eds.) Dynamical Systems and Turbulence, Warwick 1980. LNM, vol. 898, pp. 366–381. Springer, Heidelberg (1981). https://doi.org/10.1007/BFb0091924

    Chapter  Google Scholar 

  47. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009)

    Google Scholar 

  48. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995). https://doi.org/10.1007/978-1-4757-3264-1

    Book  MATH  Google Scholar 

  49. Žliobaitė, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98(3), 455–482 (2014). https://doi.org/10.1007/s10994-014-5441-4

    Article  MathSciNet  MATH  Google Scholar 

  50. Zhang, K., Ng, C.T., Na, M.: Computational explosion in the frequency estimation of sinusoidal data. Commun. Stat. Appl. Meth. 25(4), 431–442 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo A. Rios .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Read, J., Rios, R.A., Nogueira, T., de Mello, R.F. (2020). Data Streams Are Time Series: Challenging Assumptions. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61380-8_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61379-2

  • Online ISBN: 978-3-030-61380-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics