Abstract
The increasingly relevance of data streams in the context of machine learning and artificial intelligence has motivated this paper which discusses and draws necessary relationships between the concepts of data streams and time series in attempt to build on theoretical foundations to support online learning in such scenarios. We unify the concepts of data streams and time series by assessing their definitions in the literature and discuss the major implications of this claim on the way that data streams research and practice is carried out, showing that many common assumptions are incorrect or unnecessary. We analyzed six data sources typically used in benchmark data-stream classification and found that none of those meet the requirements and assumptions qualifying them for online learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ITS provides federal departments and agencies comprehensive definitions of terms used in telecommunications and directly related fields by the U.S. Government and internationally; see https://www.its.bldrdoc.gov/fs-1037/dir-010/_1451.htm.
- 2.
This dataset and corresponding sources are available at https://moa.cms.waikato.ac.nz/datasets.
- 3.
Again, see https://moa.cms.waikato.ac.nz/~datasets/.
References
Tennant, M., Stahl, F., Rana, O., Gomes, J.B.: Scalable real-time classification of data streams with concept drift. Fut. Gener. Comput. Syst. 75, 187–199 (2017). https://doi.org/10.1016/j.future.2017.03.026
Aaij, R.: Tesla: an application for real-time data analysis in high energy physics. Comput. Phys. Commun. 208, 35–42 (2016). https://doi.org/10.1016/j.cpc.2016.07.022
Aggarwal, C.C.: Data Streams: Models and Algorithms. Advances in Database Systems, vol. 31. Springer, Heidelberg (2006). https://doi.org/10.1007/978-0-387-47534-9
Aggarwal, C.C.: A survey of stream classification algorithms. In: Data Classification: Algorithms and Applications (2014)
Al-Khateeb, T.: Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28(10), 2752–2764 (2016). https://doi.org/10.1109/TKDE.2015.2507123
Alligood, K., Sauer, T., Yorke, J.: Chaos: An Introduction to Dynamical Systems. Textbooks in Mathematical Sciences. Springer, New York (2000). https://doi.org/10.1007/b97589
Bélair, J., Glass, L., der Heiden, U., Milton, J.: Dynamical disease: identification, temporal aspects and treatment strategies of human illness. Chaos Interdisc. J. Nonlinear Sci. 5(1), 1–7 (1995)
Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Frontiers in Artificial Intelligence and Applications. IOS Press, Amsterdam (2010)
Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Upper Saddle River (1994)
de Carvalho Pagliosa, L., de Mello, R.F.: Applying a kernel function on time-dependent data to provide supervised-learning guarantees. Exp. Syst. Appl. 71, 216–229 (2017). https://doi.org/10.1016/j.eswa.2016.11.028
Chen, G., Fang, X., Fan, H.: Estimating hourly water temperatures in rivers using modified sine and sinusoidal wave functions. J. Hydrol. Eng. 21(10), 05016023 (2016)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Stochastic Modelling and Applied Probability. Springer, New York (1997). https://doi.org/10.1007/978-1-4612-0711-5
Diggle, P., et al.: Analysis of Longitudinal Data. Oxford Statistical Science Series. Oxford University Press, Oxford (2002)
Frees, E.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 39–59. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-47534-9_3
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 441–4437 (2014). https://doi.org/10.1145/2523813
Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)
Gorman, M., Widmann, P., Robbins, K.: Nonlinear dynamics of a convection loop: a quantitative comparison of experiment with theory. Physica D 19(2), 255–267 (1986). https://doi.org/10.1016/0167-2789(86)90022-9
Hegger, R., Kantz, H., Schreiber, T.: Practical implementation of nonlinear time series methods: the TISEAN package. Chaos Interdiscip. J. Nonlinear Sci. 9(2), 413–435 (1999)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001). https://doi.org/10.1145/502512.502529
Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, 2nd edn. Cambridge University Press, Cambridge (2003). https://doi.org/10.1017/CBO9780511755798
Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 367–371. Citeseer (1999)
Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001, pp. 289–296. IEEE Computer Society, Washington, DC, USA (2001)
Knobloch, E.: Chaos in the segmented disc dynamo. Phys. Lett. A 82(9), 439–440 (1981)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017). https://doi.org/10.1016/j.inffus.2017.02.004
Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16, 1–10 (2014). https://doi.org/10.1145/2674026.2674028
Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963). https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2
von Luxburg, U., Schölkopf, B.: Statistical Learning Theory: Models, Concepts, and Results, vol. 10, pp. 651–706. Elsevier North Holland, Amsterdam (2011)
Masud, M.M.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2012). https://doi.org/10.1007/s10115-011-0447-8
McGregor, A., Pavan, A., Tirthapura, S., Woodruff, D.P.: Space-efficient estimation of statistics over sub-sampled streams. Algorithmica 74(2), 787–811 (2016)
Fernandes de Mello, R., Antonelli Ponti, M.: Machine Learning - A Practical Approach on the Statistical Learning Theory. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94989-5
de Mello, R.F., Vaz, Y., Ferreira, C.H.G., Bifet, A.: On learning guarantees to unsupervised concept drift detection on data streams. Exp. Syst. Appl. 117, 90–102 (2019). https://doi.org/10.1016/j.eswa.2018.08.054
Poland, D.: Cooperative catalysis and chemical chaos: a chemical model for the lorenz equations. Physica D 65(1), 86–99 (1993). https://doi.org/10.1016/0167-2789(93)90006-M
Puthal, D.: Lattice-modeled information flow control of big sensing data streams for smart health application. IEEE Internet Things J. 6(2), 1312–1320 (2019). https://doi.org/10.1109/JIOT.2018.2805896
Rajaraman, A., Leskovec, J., Ullman, J.D.: Mining Massive Datasets (2014). http://infolab.stanford.edu/~ullman/mmds/book.pdf
Hsiao, C.: Analysis of Panel Data, 2nd edn, p. 382. Cambridge University Press, Cambridge (2003). ISBN: 0-521-81855-9, [uk pound]21.95. Int. J. Forecast. 20(1), 142–143 (2004)
Richards, N.M., King, J.H.: Three paradoxes of big data. Stan. L. Rev. Online 66, 41 (2013)
Rios, R.A., de Mello, R.F.: Applying empirical mode decomposition and mutual information to separate stochastic and deterministic influences embedded in signals. Sig. Process. 118, 159–176 (2016). https://doi.org/10.1016/j.sigpro.2015.07.003
Rios, R.A., Pagliosa, P.A., Ishii, R.P., de Mello, R.F.: TSViz: a data stream architecture to online collect, analyze, and visualize tweets. In: Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, 3–7 April 2017, pp. 1031–1036 (2017). https://doi.org/10.1145/3019612.3019811
Roseberry, M., Cano, A.: Multi-label kNN classifier with self adjusting memory for drifting data streams. In: Torgo, L., Matwin, S., Japkowicz, N., Krawczyk, B., Moniz, N., Branco, P. (eds.) Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR. Proceedings of Machine Learning Research, ECML-PKDD, Dublin, Ireland, 10 September 2018, vol. 94, pp. 23–37 (2018)
Rössler, O.: An equation for continuous chaos. Phys. Lett. A 57(5), 397–398 (1976). https://doi.org/10.1016/0375-9601(76)90101-8
Serrà, J., Gómez, E., Herrera, P.: Audio cover song identification and similarity: background, approaches, evaluation, and beyond. In: Raś, Z.W., Wieczorkowska, A.A. (eds.) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol. 274. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11674-2_14
Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics, 2nd edn. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-319-52452-8
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 131–1331 (2013). https://doi.org/10.1145/2522968.2522981
Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.-S. (eds.) Dynamical Systems and Turbulence, Warwick 1980. LNM, vol. 898, pp. 366–381. Springer, Heidelberg (1981). https://doi.org/10.1007/BFb0091924
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995). https://doi.org/10.1007/978-1-4757-3264-1
Žliobaitė, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98(3), 455–482 (2014). https://doi.org/10.1007/s10994-014-5441-4
Zhang, K., Ng, C.T., Na, M.: Computational explosion in the frequency estimation of sinusoidal data. Commun. Stat. Appl. Meth. 25(4), 431–442 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Read, J., Rios, R.A., Nogueira, T., de Mello, R.F. (2020). Data Streams Are Time Series: Challenging Assumptions. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-61380-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)