Skip to main content

A Survey on Supervised Classification on Data Streams

  • Chapter
Business Intelligence (eBISS 2014)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 205))

Included in the following conference series:

Abstract

The last ten years were prolific in the statistical learning and data mining field and it is now easy to find learning algorithms which are fast and automatic. Historically a strong hypothesis was that all examples were available or can be loaded into memory so that learning algorithms can use them straight away. But recently new use cases generating lots of data came up as for example: monitoring of telecommunication network, user modeling in dynamic social network, web mining, etc. The volume of data increases rapidly and it is now necessary to use incremental learning algorithms on data streams. This article presents the main approaches of incremental supervised classification available in the literature. It aims to give basic knowledge to a reader novice in this subject.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This bound is not well used in many algorithms of incremental trees as explain in [55] but with not a very big influence on the results.

  2. 2.

    Multi-armed bandits explore and exploit online set of decisions, while minimizing the cumulated regret between the chosen decisions and the optimal decision. Originally, multi-armed bandits have been used in pharmacology to choose the best drug while minimizing the number of tests. Today, they tend to replace A/B testing for web site optimization (Google analytics), they are used for ad-serving optimization. They are well designed when the true class to predict is not known: for instance, in some domains the learning algorithm receives only partial feedback upon its prediction, i.e. a single bit of right-or-wrong, rather than the true label.

  3. 3.

    http://moa.cms.waikato.ac.nz/datasets/.

References

  1. Guyon, I., Lemaire, V., Dror, G., Vogel, D.: Analysis of the kdd cup 2009: fast scoring on a large orange customer database. In: JMLR: Workshop and Conference Proceedings, vol. 7, pp. 1–22 (2009)

    Google Scholar 

  2. Féraud, R., Boullé, M., Clérot, F., Fessant, F., Lemaire, V.: The orange customer analysis platform. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 584–594. Springer, Heidelberg (2010)

    Google Scholar 

  3. Almaksour, A., Mouchère, H., Anquetil, E.: Apprentissage incrémental et synthèse de données pour la reconnaissance de caractères manuscrits en-ligne. In: Dixième Colloque International Francophone sur l’écrit et le Document (2009)

    Google Scholar 

  4. Saunier, N., Midenet, S., Grumbach, A.: Apprentissage incrémental par sélection de données dans un flux pour une application de securité routière. In: Conférence d’Apprentissage (CAP), pp. 239–251 (2004)

    Google Scholar 

  5. Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Min. Knowl. Discov. 3(2), 131–169 (1999)

    Article  Google Scholar 

  6. Dean, T., Boddy, M.: An analysis of time-dependent planning. In: Proceedings of the Seventh National Conference on Artificial Intelligence, pp. 49–54 (1988)

    Google Scholar 

  7. Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045 (1986)

    Google Scholar 

  8. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC Press, Atlanta (2010)

    Book  MATH  Google Scholar 

  9. Joaquin Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. MIT Press, Cambridge (2009)

    Google Scholar 

  10. Bondu, A., Lemaire, V.: Etat de l’art sur les methodes statistiques d’apprentissage actif. RNTI A2 Apprentissage artificiel et fouille de données, 189 (2008)

    Google Scholar 

  11. Cornuéjols, A.: On-line learning: where are we so far? In: May, M., Saitta, L. (eds.) Ubiquitous Knowledge Discovery. LNCS, vol. 6202, pp. 129–147. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Zilberstein, S., Russell, S.: Optimal composition of real-time systems. Artif. Intell. 82(1), 181–213 (1996)

    Article  MathSciNet  Google Scholar 

  13. Quinlan, J.R.: Learning efficient classification procedures and their application to chess end games. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning - An Artificial Intelligence Approach, pp. 463–482. Springer, Heidelberg (1986)

    Google Scholar 

  14. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton (1984)

    MATH  Google Scholar 

  15. Cornuéjols, A., Miclet, L.: Apprentissage artificiel - Concepts et algorithmes. Eyrolles (2010)

    Google Scholar 

  16. Schlimmer, J., Fisher, D.: A case study of incremental concept induction. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 496–501 (1986)

    Google Scholar 

  17. Utgoff, P.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)

    Article  Google Scholar 

  18. Utgoff, P., Berkman, N., Clouse, J.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29(1), 5–44 (1997)

    Article  MATH  Google Scholar 

  19. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, New York (1992)

    Google Scholar 

  20. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  21. Domeniconi, C., Gunopulos, D.: Incremental support vector machine construction. In: ICDM, pp. 589–592 (2001)

    Google Scholar 

  22. Syed, N., Liu, H., Sung, K.: Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 317–321. ACM, New York (1999)

    Google Scholar 

  23. Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the Second SIAM International Conference on Data Mining, Arlington, Virginia, pp. 247–260 (2002)

    Google Scholar 

  24. Bordes, A., Bottou, L.: The Huller: a simple and efficient online SVM. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 505–512. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  25. Bordes, A., Ertekin, S., Weston, J., Bottou, L.: Fast kernel classiffiers with online and active learning. J. Mach. Learn. Res. 6, 1579–1619 (2005)

    MathSciNet  MATH  Google Scholar 

  26. Loosli, G., Canu, S., Bottou, L.: SVM et apprentissage des très grandes bases de données. In: Cap Conférence d’apprentissage (2006)

    Google Scholar 

  27. Lallich, S., Teytaud, O., Prudhomme, E.: Association rule interestingness: measure and statistical validation. In: Guillet, F., Hamilton, H. (eds.) Quality Measures in Data Mining. SCI, vol. 43, pp. 251–275. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  28. Schlimmer, J., Granger, R.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)

    Google Scholar 

  29. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  30. Maloof, M., Michalski, R.: Selecting examples for partial memory learning. Mach. Learn. 41(1), 27–52 (2000)

    Article  MATH  Google Scholar 

  31. Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: International Conference on Artificial Intelligence, pp. 223–228. AAAI (1992)

    Google Scholar 

  32. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 130, 103–130 (1997)

    Article  MATH  Google Scholar 

  33. Kohavi, R.: Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 7. AAAI Press, Menlo Park (1996)

    Google Scholar 

  34. Heinz, C.: Density estimation over data streams (2007)

    Google Scholar 

  35. John, G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)

    Google Scholar 

  36. Lu, J., Yang, Y., Webb, G.I.: Incremental discretization for Naïve-Bayes classifier. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 223–238. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  37. Aha, D.W. (ed.): Lazy Learning. Springer, New York (1997)

    MATH  Google Scholar 

  38. Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  39. Hooman, V., Li, C.S., Castelli, V.: Fast search and learning for fast similarity search. In: Storage and Retrieval for Media Databases, vol. 3972, pp. 32–42 (2000)

    Google Scholar 

  40. Moreno-Seco, F., Micó, L., Oncina, J.: Extending LAESA fast nearest neighbour algorithm to find the \(k\) nearest neighbours. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 718–724. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  41. Kononenko, I., Robnik, M.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. J. 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  42. Globersonn, A., Roweis, S.: Metric learning by collapsing classes. In: Neural Information Processing Systems (NIPS) (2005)

    Google Scholar 

  43. Weinberger, K., Saul, L.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. (JMLR) 10, 207–244 (2009)

    MATH  Google Scholar 

  44. Sankaranarayanan, J., Samet, H., Varshney, A.: A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput. Graph. 31, 157–174 (2007)

    Article  Google Scholar 

  45. Domingos, P., Hulten, G.: Catching up with the data: research issues in mining data streams. In: Workshop on Research Issues in Data Mining and Knowledge Discovery (2001)

    Google Scholar 

  46. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, Menlo Park (1996)

    Google Scholar 

  47. Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. ACM SIGMOD Rec. 34(4), 42–47 (2005)

    Article  Google Scholar 

  48. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM, New York (2001)

    Google Scholar 

  49. Zighed, D., Rakotomalala, R.: Graphes d’induction: apprentissage et data mining. Hermes Science Publications, Paris (2000)

    Google Scholar 

  50. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–34. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  51. Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the International Conference on Very Large Data Bases, pp. 544–555 (1996)

    Google Scholar 

  52. Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest - a framework for fast decision tree construction of large datasets. Data Min. Knowl. Disc. 4(2), 127–162 (2000)

    Article  Google Scholar 

  53. Oates, T., Jensen, D.: The effects of training set size on decision tree complexity. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 254–262 (1997)

    Google Scholar 

  54. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  55. Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 298–309. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  56. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, New York (2000)

    Google Scholar 

  57. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM, New York (2003)

    Google Scholar 

  58. Ramos-Jiménez, G., del Campo-Avila, J., Morales-Bueno, R.: Incremental algorithm driven by error margins. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 358–362. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  59. del Campo-Avila, J., Ramos-Jiménez, G., Gama, J., Morales-Bueno, R.: Improving prediction accuracy of an incremental algorithm driven by error margins. Knowledge Discovery from Data Streams, 57 (2006)

    Google Scholar 

  60. Kirkby, R.: Improving hoeffding trees. Ph.D. thesis, University of Waikato (2008)

    Google Scholar 

  61. Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 161–169. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  62. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  63. Robert, E., Freund, Y.: Boosting - Foundations and Algorithms. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  64. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2003, pp. 226–235. ACM Press, New York (2003)

    Google Scholar 

  65. Seidl, T., Assent, I., Kranen, P., Krieger, R., Herrmann, J.: Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 311–322. ACM (2009)

    Google Scholar 

  66. Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)

    Article  Google Scholar 

  67. Tsang, I., Kwok, J., Cheung, P.: Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 6(1), 363 (2006)

    MathSciNet  MATH  Google Scholar 

  68. Dong, J.X., Krzyzak, A., Suen, C.Y.: Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 603–618 (2005)

    Article  Google Scholar 

  69. Usunier, N., Bordes, A., Bottou, L.: Guarantees for approximate incremental SVMs. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 884–891 (2010)

    Google Scholar 

  70. Do, T., Nguyen, V., Poulet, F.: GPU-based parallel SVM algorithm. Jisuanji Kexue yu Tansuo 3(4), 368–377 (2009)

    Google Scholar 

  71. Ferrer-Troyano, F., Aguilar-Ruiz, J.S., Riquelme, J.C.: Incremental rule learning based on example nearness from numerical data streams. In: Proceedings of the 2005 ACM Symposium on Applied Computing, p. 572. ACM (2005)

    Google Scholar 

  72. Ferrer-Troyano, F., Aguilar-Ruiz, J., Riquelme, J.: Data streams classification by incremental rule learning with parameterized generalization. In: Proceedings of the 2006 ACM Symposium on Applied Computing, p. 661. ACM (2006)

    Google Scholar 

  73. Gama, J.A., Kosina, P.: Learning decision rules from data streams. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1255–1260. AAAI Press (2011)

    Google Scholar 

  74. Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: Proceedings of the 2006 ACM Symposium on Applied (2006)

    Google Scholar 

  75. Gibbons, P., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database 27(3), 261–298 (2002)

    Article  Google Scholar 

  76. Vitter, J.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  77. Salperwyck, C., Lemaire, V., Hue, C.: Incremental weighted naive Bayes classifiers for data streams. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg (2014)

    Google Scholar 

  78. Law, Y.-N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 108–120. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  79. Beringer, J., Hüllermeier, E.: Efficient instance-based learning on data streams. Intell. Data Anal. 11(6), 627–650 (2007)

    Article  Google Scholar 

  80. Shaker, A., Hüllermeier, E.: Iblstreams: a system for instance-based classification and regression on data streams. Evolving Syst. 3(4), 235–249 (2012)

    Article  Google Scholar 

  81. Cesa-Bianchi, N., Conconi, A., Gentile, C.: On the generalization ability of on-line learning algorithms. IEEE Trans. Inf. Theory 50(9), 2050–2057 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  82. Block, H.: The perceptron: a model for brain functioning. Rev. Mod. Phys. 34, 123–135 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  83. Novikoff, A.B.: On convergence proofs for perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12, pp. 615–622 (1963)

    Google Scholar 

  84. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)

    Book  MATH  Google Scholar 

  85. Crammer, K., Kandola, J., Holloway, R., Singer, Y.: Online classification on a budget. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2003)

    Google Scholar 

  86. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)

    MathSciNet  MATH  Google Scholar 

  87. Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for svm. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 807–814. ACM, New York (2007)

    Google Scholar 

  88. Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. IEEE Trans. Sig. Process. 52(8), 2165–2176 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  89. Engel, Y., Mannor, S., Meir, R.: The kernel recursive least squares algorithm. IEEE Trans. Sig. Process. 52, 2275–2285 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  90. Csató, L., Opper, M.: Sparse on-line Gaussian processes. Neural Comput. 14(3), 641–668 (2002)

    Article  MATH  Google Scholar 

  91. Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)

    Article  MATH  Google Scholar 

  92. Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)

    Article  MATH  Google Scholar 

  93. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  94. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)

    Book  MATH  Google Scholar 

  95. Sutskever, I.: A simpler unified analysis of budget perceptrons. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 14–18 June, pp. 985–992 (2009)

    Google Scholar 

  96. Dekel, O., Shalev-Shwartz, S., Singer, Y.: The forgetron: a kernel-based perceptron on a budget. SIAM J. Comput. 37(5), 1342–1372 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  97. Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: International Conference on Machine Learning (2008)

    Google Scholar 

  98. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2009, p. 139 (2009)

    Google Scholar 

  99. Žliobaite, I.: Learning under concept drift: an overview. CoRR abs/1010.4784 (2010)

    Google Scholar 

  100. Lazarescu, M.M., Venkatesh, S., Bui, H.H.: Using multiple windows to track concept drift. Intell. Data Anal. 8(1), 29–59 (2004)

    Article  Google Scholar 

  101. Bifet, A., Gama, J., Pechenizkiy, M., Žliobaite, I.: Pakdd tutorial: Handling concept drift: Importance, challenges and solutions (2011)

    Google Scholar 

  102. Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3, 157–195 (2003)

    Google Scholar 

  103. Faria, E.R., Goncalves, I.J.C.R., Gama, J., Carvalho, A.C.P.L.F.: Evaluation methodology for multiclass novelty detection algorithms. In: Brazilian Conference on Intelligent Systems, BRACIS 2013, Fortaleza, CE, Brazil, 19–24 October, pp. 19–25 (2013)

    Google Scholar 

  104. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  105. Baena-García, M., Del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams, vol. 6, pp. 77–86 (2006)

    Google Scholar 

  106. Gama, J., Rodrigues, P.P., Sebastiao, R., Rodrigues, P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM, New York (2009)

    Google Scholar 

  107. Page, E.: Continuous inspection schemes. Biometrika 41(1–2), 100 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  108. Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.: Test of page-hinkley, an approach for fault detection in an agro-alimentary production system. In: 5th Asian Control Conference, vol. 2, pp. 815–818 (2004)

    Google Scholar 

  109. Bondu, A., Boullé, M.: A supervised approach for change detection in data streams (2011)

    Google Scholar 

  110. Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Mach. Learn. 65(1), 131–165 (2006)

    Article  MATH  Google Scholar 

  111. Minku, L., Yao, X.: DDD: a new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24, 619–633 (2012)

    Article  Google Scholar 

  112. Widmer, G., Kubat, M.: Learning flexible concepts from streams of examples: FLORA2. In: Proceedings of the 10th European Conference on Artificial Intelligence. Number section 5, pp. 463–467. Wiley (1992)

    Google Scholar 

  113. Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: SIGMOD, pp. 58–66 (2001)

    Google Scholar 

  114. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  115. Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM, New York (2001)

    Google Scholar 

  116. Kolter, J., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the Third International IEEE Conference on Data Mining, pp. 123–130 (2003)

    Google Scholar 

  117. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining, pp. 443–448 (2007)

    Google Scholar 

  118. Jaber, G.: An approach for online learning in the presence of concept changes. Ph.D. thesis, Université AgroParisTech (France) (2013)

    Google Scholar 

  119. Gama, J., Kosina, P.: Tracking recurring concepts with metalearners. In: Progress in Artificial Intelligence: 14th Portuguese Conference on Artificial Intelligence, p. 423 (2009)

    Google Scholar 

  120. Gomes, J.B., Menasalvas, E., Sousa, P.A.C.: Tracking recurrent concepts using context. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 168–177. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  121. Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artif. Intell. Rev. 11(1), 133–155 (1997)

    Article  Google Scholar 

  122. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339 (2006)

    Google Scholar 

  123. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)

    Google Scholar 

  124. Bifet, B., Gama, J., Gavalda, R., Krempl, G., Pechenizkiy, M., Pfahringer, B., Spiliopoulou, M., Žliobaite, I.: Advanced topics on data stream mining. Tutorial at the ECMLPKDD 2012 (2012)

    Google Scholar 

  125. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31, 1–38 (2004)

    MathSciNet  Google Scholar 

  126. Bifet, A., Read, J., Žliobaité, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS (LNAI), vol. 8188, pp. 465–479. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  127. Žliobaité, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98, 455–482 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  128. Dawid, A.: Present position and potential developments: some personal views: statistical theory: the prequential approach. J. Roy. Stat. Soc. Ser. A (General) 147, 278–292 (1984)

    Article  Google Scholar 

  129. Brzezinski, D., Stefanowski, J.: Prequential AUC for classifier evaluation and drift detection in evolving data streams. In: Proceedings of the Workshop New Frontiers in Mining Complex Patterns (NFMCP 2014) held in European Conference on Machine Learning (ECML) (2014)

    Google Scholar 

  130. Bifet, A.: Adaptive learning and mining for data streams and frequent patterns. Ph.D. thesis, Universitat Politécnica de Catalunya (2009)

    Google Scholar 

  131. Agrawal, R.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)

    Article  Google Scholar 

  132. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. J. Univ. Comput. Sci. 11(8), 1353–1366 (2005)

    Google Scholar 

  133. Bifet, A., Kirkby, R.: Data stream mining a practical approach. J. Empirical Finance 8(3), 325–342 (2009)

    Google Scholar 

  134. Minku, L.L., White, A.P., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22(5), 730–742 (2010)

    Article  Google Scholar 

  135. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009)

    Google Scholar 

  136. Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl. Inf. Syst. 22(3), 371–391 (2010)

    Article  Google Scholar 

  137. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  138. Žliobaité, I., Budka, M., Stahl, F.: Towards cost-sensitive adaptation: when is it worth updating your predictive model? Neurocomputing 150, 240–249 (2014)

    Article  Google Scholar 

  139. Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 299–310. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  140. Littlestone, N., Warmuth, M.: The weighted majority algorithm. In: 30th Annual Symposium on Foundations of Computer Science, pp. 256–261 (1989)

    Google Scholar 

  141. Krempl, G., Žliobaite, I., Brzezinski, D., Hllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explorations (Special Issue on Big Data) 16, 1–10 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Lemaire .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lemaire, V., Salperwyck, C., Bondu, A. (2015). A Survey on Supervised Classification on Data Streams. In: Zimányi, E., Kutsche, RD. (eds) Business Intelligence. eBISS 2014. Lecture Notes in Business Information Processing, vol 205. Springer, Cham. https://doi.org/10.1007/978-3-319-17551-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17551-5_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17550-8

  • Online ISBN: 978-3-319-17551-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics