skip to main content
research-article

Machine learning for streaming data: state of the art, challenges, and opportunities

Published:26 November 2019Publication History
Skip Abstract Section

Abstract

Incremental learning, online learning, and data stream learning are terms commonly associated with learning algorithms that update their models given a continuous influx of data without performing multiple passes over data. Several works have been devoted to this area, either directly or indirectly as characteristics of big data processing, i.e., Velocity and Volume. Given the current industry needs, there are many challenges to be addressed before existing methods can be efficiently applied to real-world problems. In this work, we focus on elucidating the connections among the current stateof- the-art on related fields; and clarifying open challenges in both academia and industry. We treat with special care topics that were not thoroughly investigated in past position and survey papers. This work aims to evoke discussion and elucidate the current research opportunities, highlighting the relationship of different subareas and suggesting courses of action when possible.

References

  1. Z. S. Abdallah, M. M. Gaber, B. Srinivasan, and S. Krishnaswamy. Activity recognition with evolving data streams: A review. ACM Computing Surveys (CSUR), 51(4):71, 2018.Google ScholarGoogle Scholar
  2. A. Agarwal, O. Chapelle, M. Dud´ik, and J. Langford. A reliable effective terascale linear learning system. The Journal of Machine Learning Research, 15(1):1111--1133, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In International Conference on Very Large Data Bases (VLDB), pages 81--92, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. C. Aggarwal and P. S. Yu. On classification of highcardinality data streams. In SIAM International Conference on Data Mining, pages 802--813, 2010.Google ScholarGoogle Scholar
  5. T. Al-Khateeb, M. M. Masud, L. Khan, C. C. Aggarwal, J. Han, and B. M. Thuraisingham. Stream classification with recurring and novel class detection using class-based ensemble. In ICDM, pages 31--40, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Armbrust, T. Das, J. Torres, B. Yavuz, S. Zhu, R. Xin, A. Ghodsi, I. Stoica, and M. Zaharia. Structured streaming: A declarative api for real-time applications in apache spark. In International Conference on Management of Data, pages 601--613, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Baena-Garc´a, J. del Campo- ´Avila, R. Fidalgo, A. Bifet, R. Gavald'a, and R. Morales-Bueno. Early drift detection method. 2006.Google ScholarGoogle Scholar
  8. D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. P. Barddal, H. M. Gomes, and F. Enembreck. Analyzing the impact of feature drifts in streaming learning. In International Conference on Neural Information Processing, pages 21--28. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. P. Barddal, H. M. Gomes, F. Enembreck, and B. Pfahringer. A survey on feature drift adaptation: Definition, benchmark, challenges and future directions. Journal of Systems and Software, 127:278 -- 294, 2017.Google ScholarGoogle Scholar
  11. R. Bardenet, M. Brendel, B. K´egl, and M. Sebag. Collaborative hyperparameter tuning. In International Conference on Machine Learning, pages 199-- 207, 2013.Google ScholarGoogle Scholar
  12. M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ACM Symposium on Information, computer and communications security, pages 16--25, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164--171, 1970.Google ScholarGoogle Scholar
  14. Y. Ben-Haim and E. Tom-Tov. A streaming parallel decision tree algorithm. The Journal of Machine Learning Research, 11:849--872, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Bifet. Classifier concept drift detection and the illusion of progress. In International Conference on Artificial Intelligence and Soft Computing, pages 715--725. Springer, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Bifet, G. de Francisci Morales, J. Read, G. Holmes, and B. Pfahringer. Efficient online evaluation of big data stream classifiers. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 59--68, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM international conference on data mining, pages 443--448, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Bifet and R. Gavald'a. Adaptive learning from evolving data streams. In International Symposium on Intelligent Data Analysis, pages 249--260. Springer, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Bifet, R. Gavalda, G. Holmes, and B. Pfahringer. Machine Learning for Data Streams: with Practical Examples in MOA. Adaptive Computation and Machine Learning series. MIT Press, 2018.Google ScholarGoogle Scholar
  20. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. Moa: Massive online analysis. The Journal of Machine Learning Research, 11:1601--1604, 2010. SIGKDD Explorations Volume 21, Issue 1 Page 19Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Bifet, G. Holmes, and B. Pfahringer. Leveraging bagging for evolving data streams. In PKDD, pages 135--150, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Conference on Computational learning theory, pages 92--100, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015.Google ScholarGoogle Scholar
  26. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321--357, 2002.Google ScholarGoogle Scholar
  27. S. Chen and H. He. Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Systems, 2(1):35--50, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi. Online clustering for evolving data streams with online anomaly detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 508--521. Springer, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng. Unbiased online active learning in data streams. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 195-- 203, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002.Google ScholarGoogle Scholar
  32. E. R. de Faria, A. C. P. de Leon Ferreira de Carvalho, and J. Gama. MINAS: multiclass learning algorithm for novelty detection in data streams. Data Mining Knowledge Discovery, 30(3):640--680, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. G. De Francisci Morales and A. Bifet. Samoa: Scalable advanced massive online analysis. Journal of Machine Learning Research, 16:149--153, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Dembczy´nski, W. Waegeman, W. Cheng, and E. H¨ullermeier. On label dependence and loss minimization in multi-label classification. Mach. Learn., 88(1--2):5--45, July 2012.Google ScholarGoogle Scholar
  35. G. Ditzler, M. D. Muhlbaier, and R. Polikar. Incremental learning of new classes in unbalanced datasets: Learn++.udnc. In International Workshop on Multiple Classifier Systems, pages 33--42, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Ditzler, R. Polikar, and N. Chawla. An incremental learning algorithm for non-stationary environments and class imbalance. In International Conference on Pattern Recognition, pages 2997--3000, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Domingos and G. Hulten. Mining high-speed data streams. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71-- 80, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. R. T. Donders, G. J. Van Der Heijden, T. Stijnen, and K. G. Moons. A gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10):1087--1091, 2006.Google ScholarGoogle Scholar
  39. Y. Dong and N. Japkowicz. Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 34(1):261--281, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  40. D. M. dos Reis, P. Flach, S. Matwin, and G. Batista. Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1545--1554, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K.-L. Du and M. N. Swamy. Neural Networks and Statistical Learning. Springer Publishing Company, Incorporated, 2013.Google ScholarGoogle Scholar
  42. R. Elwell and R. Polikar. Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks, 22(10):1517--1531, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. A. Faisal, Z. Aung, J. R.Williams, and A. Sanchez. Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: A feasibility study. IEEE Systems journal, 9(1):31--44, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  44. W. Fan and A. Bifet. Mining big data: current status, and forecast to the future. ACM SIGKDD Explorations Newsletter, 14(2):1--5, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. ACM Sigmod Record, 34(2):18--26, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybridbased approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4):463--484, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Galelli, G. B. Humphrey, H. R. Maier, A. Castelletti, G. C. Dandy, and M. S. Gibbs. An evaluation framework for input variable selection algorithms for environmental data-driven models. Environmental Modelling & Software, 62:33 -- 51, 2014.Google ScholarGoogle Scholar
  48. J. Gama and P. Kosina. Learning about the learning process. In International Symposium on Intelligent Data Analysis, pages 162--172, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  49. J. Gama and P. Kosina. Recurrent concepts in data streams classification. Knowledge and Information Systems, 40(3):489--507, 2014. SIGKDD Explorations Volume 21, Issue 1 Page 20Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):44, 2014.Google ScholarGoogle Scholar
  51. S. Garc´a, S. Ram´rez-Gallego, J. Luengo, J. M. Ben´tez, and F. Herrera. Big data preprocessing: methods and prospects. Big Data Analytics, 1(1):9, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  52. A. Ghazikhani, R. Monsefi, and H. S. Yazdi. Ensemble of online neural networks for non-stationary and imbalanced data streams. Neurocomputing, 122:535--544, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet. A survey on ensemble learning for data stream classification. ACM Computing Surveys, 50(2):23:1--23:36, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, G. Holmes, and T. Abdessalem. Adaptive random forests for evolving data stream classification. Machine Learning, 106(9- 10):1469--1495, 2017.Google ScholarGoogle Scholar
  55. H. M. Gomes and F. Enembreck. Sae: Social adaptive ensemble classifier for data streams. In IEEE Symposium on Computational Intelligence and Data Mining, pages 199--206, April 2013.Google ScholarGoogle ScholarCross RefCross Ref
  56. H. M. Gomes and F. Enembreck. Sae2: Advances on the social adaptive ensemble classifier for data streams. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC), SAC 2014, pages 199--206, March 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. H. M. Gomes, J. Read, and A. Bifet. Streaming random patches for evolving data stream classification. In IEEE International Conference on Data Mining. IEEE, 2019.Google ScholarGoogle Scholar
  58. M. Grzenda, H. M. Gomes, and A. Bifet. Delayed labelling evaluation for data streams. Data Mining and Knowledge Discovery, to appear.Google ScholarGoogle Scholar
  59. S. Guha, N. Mishra, G. Roy, and O. Schrijvers. Robust random cut forest based anomaly detection on streams. In International Conference on Machine Learning, pages 2712--2721, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations newsletter, 11(1):10--18, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. A. Haque, B. Parker, L. Khan, and B. Thuraisingham. Evolving big data stream classification with mapreduce. In International Conference on Cloud Computing (CLOUD), pages 570--577, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. M. Harries and N. S.Wales. Splice-2 comparative evaluation: Electricity pricing. 1999.Google ScholarGoogle Scholar
  63. S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari. Adapted one-versus-all decision trees for data stream classification. IEEE Transactions on Knowledge and Data Engineering, 21(5):624--637, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. M. J. Hosseini, A. Gholipour, and H. Beigy. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 46(3):567--597, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar. Adversarial machine learning. In ACM workshop on Security and artificial intelligence, pages 43--58, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. N. Jiang and L. Gruenwald. Research issues in data stream association rule mining. ACM Sigmod Record, 35(1):14--19, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. T. Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200--209, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. I. Katakis, G. Tsoumakas, E. Banos, N. Bassiliades, and I. Vlahavas. An adaptive personalized news dissemination system. Journal of Intelligent Information Systems, 32(2):191--212, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  69. R. Klinkenberg. Using labeled and unlabeled data to learn drifting concepts. In IJCAI Workshop on Learning from Temporal and Spatial Data, pages 16--24, 2001.Google ScholarGoogle Scholar
  70. J. Z. Kolter, M. Maloof, et al. Dynamic weighted majority: A new ensemble method for tracking concept drift. In ICDM, pages 123--130, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. P. Kosina and J. a. Gama. Very fast decision rules for classification in data streams. Data Mining and Knowledge Discovery, 29(1):168--202, Jan. 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. N. Kourtellis, G. D. F. Morales, A. Bifet, and A. Murdopo. Vht: Vertical hoeffding tree. In IEEE International Conference on Big Data, pages 915--922, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  73. G. Krempl, I. Zliobaite, D. Brzezi´nski, E. H¨ullermeier, M. Last, V. Lemaire, T. Noack, A. Shaker, S. Sievi, M. Spiliopoulou, et al. Open challenges for data stream mining research. ACM SIGKDD Explorations newsletter, 16(1):1--10, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. L. I. Kuncheva. A stability index for feature selection. In International Multi-Conference: Artificial Intelligence and Applications, AIAP'07, pages 390--395, 2007.Google ScholarGoogle Scholar
  75. B. Kveton, H. H. Bui, M. Ghavamzadeh, G. Theocharous, S. Muthukrishnan, and S. Sun. Graphical model sketch. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 81--97, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. J. Langford, L. Li, and A. Strehl. Vowpal Wabbit, 2007.Google ScholarGoogle Scholar
  77. P. Lehtinen, M. Saarela, and T. Elomaa. Online chimerge algorithm. In Data Mining: Foundations and Intelligent Paradigms, pages 199--216. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  78. M. Li, M. Liu, L. Ding, E. A. Rundensteiner, and M. Mani. Event stream processing with out-of-order data arrival. In International Conference on Distributed Computing Systems Workshops, pages 67--67, 2007. SIGKDD Explorations Volume 21, Issue 1 Page 21Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, J. Gonzalez, K. Goldberg, and I. Stoica. Ray rllib: A composable and scalable reinforcement learning library. arXiv preprint arXiv:1712.09381, 2017.Google ScholarGoogle Scholar
  80. V. L´opez, A. Fern´andez, S. Garc´a, V. Palade, and F. Herrera. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250:113--141, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  81. V. Losing, B. Hammer, and H. Wersing. Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 275:1261--1274, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. G. Louppe and P. Geurts. Ensembles on random patches. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 346--361. Springer, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  83. D. Marron, J. Read, A. Bifet, T. Abdessalem, E. Ayguade, and J. Herrero. Echo state hoeffding tree learning. In R. J. Durrant and K.-E. Kim, editors, Asian Conference on Machine Learning, volume 63, pages 382--397, 2016.Google ScholarGoogle Scholar
  84. M. Masud, J. Gao, L. Khan, J. Han, and B. M. Thuraisingham. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering, 23(6):859--874, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. M. M. Masud, J. Gao, L. Khan, J. Han, and B. Thuraisingham. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In ICDM, pages 929--934. IEEE, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. I. Mitliagkas, C. Caramanis, and P. Jain. Memory limited, streaming pca. In Advances in Neural Information Processing Systems, pages 2886--2894, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. J. Montiel, J. Read, A. Bifet, and T. Abdessalem. Scikit-multiflow: A multi-output streaming framework. Journal of Machine Learning Research, 19(72), 2018.Google ScholarGoogle Scholar
  88. M. D. Muhlbaier, A. Topalis, and R. Polikar. Learn++.nc: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE transactions on neural networks, 20(1):152--168, 2009.Google ScholarGoogle Scholar
  89. J. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7:308--313, 1965.Google ScholarGoogle ScholarCross RefCross Ref
  90. H.-L. Nguyen, Y.-K. Woon, W.-K. Ng, and L. Wan. Heterogeneous ensemble for feature drifts in data streams. In P.-N. Tan, S. Chawla, C. K. Ho, and J. Bailey, editors, Advances in Knowledge Discovery and Data Mining, pages 1--12, 2012.Google ScholarGoogle Scholar
  91. S. Nogueira and G. Brown. Measuring the stability of feature selection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 442--457. Springer, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  92. A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow. Realistic evaluation of deep semisupervised learning algorithms. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 3238--3249. 2018.Google ScholarGoogle Scholar
  93. N. Oza. Online bagging and boosting. In IEEE International Conference on Systems, Man and Cybernetics, volume 3, pages 2340--2345 Vol. 3, Oct 2005.Google ScholarGoogle ScholarCross RefCross Ref
  94. S. J. Pan, Q. Yang, et al. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345--1359, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. B. Parker and L. Khan. Rapidly labeling and tracking dynamically evolving concepts in data streams. IEEE International Conference on Data Mining Workshops, 0:1161--1164, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. B. Parker, A. M. Mustafa, and L. Khan. Novel class detection and feature via a tiered ensemble approach for stream mining. In IEEE International Conference on Tools with Artificial Intelligence, volume 1, pages 1171--1178, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. B. S. Parker and L. Khan. Detecting and tracking concept class drift and emergence in non-stationary fast data streams. In AAAI Conference on Artificial Intelligence, 2015.Google ScholarGoogle Scholar
  98. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825--2830, 2011.Google ScholarGoogle Scholar
  99. B. Pfahringer, G. Holmes, and R. Kirkby. Handling numeric attributes in hoeffding trees. In Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pages 296--307, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  100. X. C. Pham, M. T. Dang, S. V. Dinh, S. Hoang, T. T. Nguyen, and A. W. C. Liew. Learning from data stream based on random projection and hoeffding tree classifier. In International Conference on Digital Image Computing: Techniques and Applications, pages 1--8, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  101. C. Pinto and J. Gama. Partition incremental discretization. In Portuguese conference on artificial intelligence, pages 168--174, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  102. J. Plasse and N. Adams. Handling delayed labels in temporally evolving data streams. In IEEE ICBD, pages 2416--2424, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  103. S. Ram´rez-Gallego, B. Krawczyk, S. Garc´a, M. Wo´zniak, and F. Herrera. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing, 239:39--57, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. J. Read, A. Bifet, G. Holmes, and B. Pfahringer. Scalable and efficient multi-label classification for evolving data streams. Machine Learning, 88(1--2):243--272, 2012. SIGKDD Explorations Volume 21, Issue 1 Page 22Google ScholarGoogle Scholar
  105. J. Read, L. Martino, and J. Hollm´en. Multi-label methods for prediction with sequential data. Pattern Recognition, 63(March):45--55, 2017.Google ScholarGoogle Scholar
  106. J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. Machine Learning, 85(3):333--359, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. P. Reutemann and J. Vanschoren. Scientific workflow management with adams. In Machine Learning and Knowledge Discovery in Databases, pages 833--837. Springer, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  108. P. Roy, A. Khan, and G. Alonso. Augmented sketch: Faster and more accurate stream processing. In International Conference on Management of Data, pages 1449--1463, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. J. Rushing, S. Graves, E. Criswell, and A. Lin. A coverage based ensemble algorithm (cbea) for streaming data. In IEEE International Conference on Tools with Artificial Intelligence, pages 106--112, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. M. Salehi and L. Rashidi. A survey on anomaly detection in evolving data:[with application to forest fire risk prediction]. ACM SIGKDD Explorations Newsletter, 20(1):13--23, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. J. C. Schlimmer and R. H. Granger. Incremental learning from noisy data. Machine learning, 1(3):317--354, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  112. A. Shrivastava, A. C. Konig, and M. Bilenko. Time adaptive sketches (ada-sketches) for summarizing data streams. In International Conference on Management of Data, pages 1417--1432, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In ICML, pages 824--831, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. K. O. Stanley and R. Miikkulainen. Efficient reinforcement learning through evolving neural network topologies. In Genetic and Evolutionary Computation Conference, page 9, San Francisco, 2002.Google ScholarGoogle Scholar
  115. I. Stoica, D. Song, R. A. Popa, D. Patterson, M. W. Mahoney, R. Katz, A. D. Joseph, M. Jordan, J. M. Hellerstein, J. E. Gonzalez, et al. A berkeley view of systems challenges for ai. arXiv preprint arXiv:1712.05855, 2017.Google ScholarGoogle Scholar
  116. W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 377--382, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, 1st edition, 1998.Google ScholarGoogle Scholar
  118. L. Torgo, R. P. Ribeiro, B. Pfahringer, and P. Branco. Smote for regression. In Portuguese conference on artificial intelligence, pages 378--389. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  119. A. Tsymbal. The problem of concept drift: definitions and related work. Technical report, 2004.Google ScholarGoogle Scholar
  120. B. Veloso, J. Gama, and B. Malheiro. Self hyperparameter tuning for data streams. In International Conference on Discovery Science, page to appear, 2018.Google ScholarGoogle Scholar
  121. I. Zliobaite, A. Bifet, J. Read, B. Pfahringer, and G. Holmes. Evaluation methods and decision theory for classification of streaming data with temporal dependence. Machine Learning, 98(3):455--482, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. G. I. Webb. Contrary to popular belief incremental discretization can be sound, computationally efficient and extremely useful for streaming data. In ICDM, pages 1031--1036. IEEE, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. G. I.Webb, R. Hyde, H. Cao, H. L. Nguyen, and F. Petitjean. Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4):964--994, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69--101, Apr. 1996.Google ScholarGoogle ScholarCross RefCross Ref
  125. K. Wu, K. Zhang, W. Fan, A. Edwards, and S. Y. Philip. Rs-forest: A rapid density estimator for streaming anomaly detection. In ICDM, pages 600-- 609. IEEE, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. X. Wu, P. Li, and X. Hu. Learning from concept drifting data streams with unlabeled data. Neurocomputing, 92:145--155, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. X. Wu, X. Zhu, G.-Q. Wu, and W. Ding. Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1):97--107, 2014.Google ScholarGoogle Scholar
  128. T. Yang, L. Liu, Y. Yan, M. Shahzad, Y. Shen, X. Li, B. Cui, and G. Xie. Sf-sketch: A fast, accurate, and memory efficient data structure to store frequencies of data items. In ICDE, pages 103--106, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  129. W. Yu, Y. Gu, and J. Li. Single-pass pca of large highdimensional data. In International Joint Conference on Artificial Intelligence, pages 3350--3356, 2017.Google ScholarGoogle Scholar
  130. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In ACM Symposium on Operating Systems Principles, pages 423--438, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. M.-L. Zhang and Z.-H. Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819--1837, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  132. Z. Zhao, F. Morstatter, S. Sharma, S. Alelyani, A. Anand, and H. Liu. Advancing feature selection research. ASU feature selection repository, pages 1--28, 2010.Google ScholarGoogle Scholar
  133. G. Zhou, K. Sohn, and H. Lee. Online incremental feature learning with denoising autoencoders. In Artificial intelligence and statistics, pages 1453--1461, 2012.Google ScholarGoogle Scholar
  134. I. Zliobaite. Change with delayed labeling: When is it detectable? In IEEE International Conference on Data Mining Workshops, pages 843--850, 2010.Google ScholarGoogle Scholar
  135. I. Zliobaite, A. Bifet, J. Read, B. Pfahringer, and G. Holmes. Evaluation methods and decision theory for classification of streaming data with temporal dependence. Machine Learning, 98(3):455--482, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Machine learning for streaming data: state of the art, challenges, and opportunities
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGKDD Explorations Newsletter
      ACM SIGKDD Explorations Newsletter  Volume 21, Issue 2
      December 2019
      100 pages
      ISSN:1931-0145
      EISSN:1931-0153
      DOI:10.1145/3373464
      Issue’s Table of Contents

      Copyright © 2019 Copyright is held by the owner/author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 November 2019

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader