A Survey on Supervised Classification on Data Streams

Lemaire, Vincent; Salperwyck, Christophe; Bondu, Alexis

doi:10.1007/978-3-319-17551-5_4

Vincent Lemaire⁸,
Christophe Salperwyck⁹ &
Alexis Bondu⁹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 205))

Included in the following conference series:

European Big Data Management and Analytics Summer School

2557 Accesses
26 Citations
3 Altmetric

Abstract

The last ten years were prolific in the statistical learning and data mining field and it is now easy to find learning algorithms which are fast and automatic. Historically a strong hypothesis was that all examples were available or can be loaded into memory so that learning algorithms can use them straight away. But recently new use cases generating lots of data came up as for example: monitoring of telecommunication network, user modeling in dynamic social network, web mining, etc. The volume of data increases rapidly and it is now necessary to use incremental learning algorithms on data streams. This article presents the main approaches of incremental supervised classification available in the literature. It aims to give basic knowledge to a reader novice in this subject.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This bound is not well used in many algorithms of incremental trees as explain in [55] but with not a very big influence on the results.
2.
Multi-armed bandits explore and exploit online set of decisions, while minimizing the cumulated regret between the chosen decisions and the optimal decision. Originally, multi-armed bandits have been used in pharmacology to choose the best drug while minimizing the number of tests. Today, they tend to replace A/B testing for web site optimization (Google analytics), they are used for ad-serving optimization. They are well designed when the true class to predict is not known: for instance, in some domains the learning algorithm receives only partial feedback upon its prediction, i.e. a single bit of right-or-wrong, rather than the true label.
3.
http://moa.cms.waikato.ac.nz/datasets/.

References

Guyon, I., Lemaire, V., Dror, G., Vogel, D.: Analysis of the kdd cup 2009: fast scoring on a large orange customer database. In: JMLR: Workshop and Conference Proceedings, vol. 7, pp. 1–22 (2009)
Google Scholar
Féraud, R., Boullé, M., Clérot, F., Fessant, F., Lemaire, V.: The orange customer analysis platform. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 584–594. Springer, Heidelberg (2010)
Google Scholar
Almaksour, A., Mouchère, H., Anquetil, E.: Apprentissage incrémental et synthèse de données pour la reconnaissance de caractères manuscrits en-ligne. In: Dixième Colloque International Francophone sur l’écrit et le Document (2009)
Google Scholar
Saunier, N., Midenet, S., Grumbach, A.: Apprentissage incrémental par sélection de données dans un flux pour une application de securité routière. In: Conférence d’Apprentissage (CAP), pp. 239–251 (2004)
Google Scholar
Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Min. Knowl. Discov. 3(2), 131–169 (1999)
Article Google Scholar
Dean, T., Boddy, M.: An analysis of time-dependent planning. In: Proceedings of the Seventh National Conference on Artificial Intelligence, pp. 49–54 (1988)
Google Scholar
Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045 (1986)
Google Scholar
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC Press, Atlanta (2010)
Book MATH Google Scholar
Joaquin Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. MIT Press, Cambridge (2009)
Google Scholar
Bondu, A., Lemaire, V.: Etat de l’art sur les methodes statistiques d’apprentissage actif. RNTI A2 Apprentissage artificiel et fouille de données, 189 (2008)
Google Scholar
Cornuéjols, A.: On-line learning: where are we so far? In: May, M., Saitta, L. (eds.) Ubiquitous Knowledge Discovery. LNCS, vol. 6202, pp. 129–147. Springer, Heidelberg (2010)
Chapter Google Scholar
Zilberstein, S., Russell, S.: Optimal composition of real-time systems. Artif. Intell. 82(1), 181–213 (1996)
Article MathSciNet Google Scholar
Quinlan, J.R.: Learning efficient classification procedures and their application to chess end games. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning - An Artificial Intelligence Approach, pp. 463–482. Springer, Heidelberg (1986)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton (1984)
MATH Google Scholar
Cornuéjols, A., Miclet, L.: Apprentissage artificiel - Concepts et algorithmes. Eyrolles (2010)
Google Scholar
Schlimmer, J., Fisher, D.: A case study of incremental concept induction. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 496–501 (1986)
Google Scholar
Utgoff, P.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)
Article Google Scholar
Utgoff, P., Berkman, N., Clouse, J.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29(1), 5–44 (1997)
Article MATH Google Scholar
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, New York (1992)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Domeniconi, C., Gunopulos, D.: Incremental support vector machine construction. In: ICDM, pp. 589–592 (2001)
Google Scholar
Syed, N., Liu, H., Sung, K.: Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 317–321. ACM, New York (1999)
Google Scholar
Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the Second SIAM International Conference on Data Mining, Arlington, Virginia, pp. 247–260 (2002)
Google Scholar
Bordes, A., Bottou, L.: The Huller: a simple and efficient online SVM. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 505–512. Springer, Heidelberg (2005)
Chapter Google Scholar
Bordes, A., Ertekin, S., Weston, J., Bottou, L.: Fast kernel classiffiers with online and active learning. J. Mach. Learn. Res. 6, 1579–1619 (2005)
MathSciNet MATH Google Scholar
Loosli, G., Canu, S., Bottou, L.: SVM et apprentissage des très grandes bases de données. In: Cap Conférence d’apprentissage (2006)
Google Scholar
Lallich, S., Teytaud, O., Prudhomme, E.: Association rule interestingness: measure and statistical validation. In: Guillet, F., Hamilton, H. (eds.) Quality Measures in Data Mining. SCI, vol. 43, pp. 251–275. Springer, Heidelberg (2007)
Chapter Google Scholar
Schlimmer, J., Granger, R.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)
Google Scholar
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Google Scholar
Maloof, M., Michalski, R.: Selecting examples for partial memory learning. Mach. Learn. 41(1), 27–52 (2000)
Article MATH Google Scholar
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: International Conference on Artificial Intelligence, pp. 223–228. AAAI (1992)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 130, 103–130 (1997)
Article MATH Google Scholar
Kohavi, R.: Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 7. AAAI Press, Menlo Park (1996)
Google Scholar
Heinz, C.: Density estimation over data streams (2007)
Google Scholar
John, G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)
Google Scholar
Lu, J., Yang, Y., Webb, G.I.: Incremental discretization for Naïve-Bayes classifier. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 223–238. Springer, Heidelberg (2006)
Chapter Google Scholar
Aha, D.W. (ed.): Lazy Learning. Springer, New York (1997)
MATH Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002)
Article MathSciNet MATH Google Scholar
Hooman, V., Li, C.S., Castelli, V.: Fast search and learning for fast similarity search. In: Storage and Retrieval for Media Databases, vol. 3972, pp. 32–42 (2000)
Google Scholar
Moreno-Seco, F., Micó, L., Oncina, J.: Extending LAESA fast nearest neighbour algorithm to find the \(k\) nearest neighbours. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 718–724. Springer, Heidelberg (2002)
Chapter Google Scholar
Kononenko, I., Robnik, M.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. J. 53, 23–69 (2003)
Article MATH Google Scholar
Globersonn, A., Roweis, S.: Metric learning by collapsing classes. In: Neural Information Processing Systems (NIPS) (2005)
Google Scholar
Weinberger, K., Saul, L.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. (JMLR) 10, 207–244 (2009)
MATH Google Scholar
Sankaranarayanan, J., Samet, H., Varshney, A.: A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput. Graph. 31, 157–174 (2007)
Article Google Scholar
Domingos, P., Hulten, G.: Catching up with the data: research issues in mining data streams. In: Workshop on Research Issues in Data Mining and Knowledge Discovery (2001)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, Menlo Park (1996)
Google Scholar
Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. ACM SIGMOD Rec. 34(4), 42–47 (2005)
Article Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM, New York (2001)
Google Scholar
Zighed, D., Rakotomalala, R.: Graphes d’induction: apprentissage et data mining. Hermes Science Publications, Paris (2000)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–34. Springer, Heidelberg (1996)
Chapter Google Scholar
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the International Conference on Very Large Data Bases, pp. 544–555 (1996)
Google Scholar
Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest - a framework for fast decision tree construction of large datasets. Data Min. Knowl. Disc. 4(2), 127–162 (2000)
Article Google Scholar
Oates, T., Jensen, D.: The effects of training set size on decision tree complexity. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 254–262 (1997)
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 298–309. Springer, Heidelberg (2013)
Chapter Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, New York (2000)
Google Scholar
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM, New York (2003)
Google Scholar
Ramos-Jiménez, G., del Campo-Avila, J., Morales-Bueno, R.: Incremental algorithm driven by error margins. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 358–362. Springer, Heidelberg (2006)
Chapter Google Scholar
del Campo-Avila, J., Ramos-Jiménez, G., Gama, J., Morales-Bueno, R.: Improving prediction accuracy of an incremental algorithm driven by error margins. Knowledge Discovery from Data Streams, 57 (2006)
Google Scholar
Kirkby, R.: Improving hoeffding trees. Ph.D. thesis, University of Waikato (2008)
Google Scholar
Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 161–169. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Robert, E., Freund, Y.: Boosting - Foundations and Algorithms. MIT Press, Cambridge (2012)
MATH Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2003, pp. 226–235. ACM Press, New York (2003)
Google Scholar
Seidl, T., Assent, I., Kranen, P., Krieger, R., Herrmann, J.: Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 311–322. ACM (2009)
Google Scholar
Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)
Article Google Scholar
Tsang, I., Kwok, J., Cheung, P.: Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 6(1), 363 (2006)
MathSciNet MATH Google Scholar
Dong, J.X., Krzyzak, A., Suen, C.Y.: Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 603–618 (2005)
Article Google Scholar
Usunier, N., Bordes, A., Bottou, L.: Guarantees for approximate incremental SVMs. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 884–891 (2010)
Google Scholar
Do, T., Nguyen, V., Poulet, F.: GPU-based parallel SVM algorithm. Jisuanji Kexue yu Tansuo 3(4), 368–377 (2009)
Google Scholar
Ferrer-Troyano, F., Aguilar-Ruiz, J.S., Riquelme, J.C.: Incremental rule learning based on example nearness from numerical data streams. In: Proceedings of the 2005 ACM Symposium on Applied Computing, p. 572. ACM (2005)
Google Scholar
Ferrer-Troyano, F., Aguilar-Ruiz, J., Riquelme, J.: Data streams classification by incremental rule learning with parameterized generalization. In: Proceedings of the 2006 ACM Symposium on Applied Computing, p. 661. ACM (2006)
Google Scholar
Gama, J.A., Kosina, P.: Learning decision rules from data streams. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1255–1260. AAAI Press (2011)
Google Scholar
Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: Proceedings of the 2006 ACM Symposium on Applied (2006)
Google Scholar
Gibbons, P., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database 27(3), 261–298 (2002)
Article Google Scholar
Vitter, J.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Article MathSciNet MATH Google Scholar
Salperwyck, C., Lemaire, V., Hue, C.: Incremental weighted naive Bayes classifiers for data streams. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg (2014)
Google Scholar
Law, Y.-N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 108–120. Springer, Heidelberg (2005)
Chapter Google Scholar
Beringer, J., Hüllermeier, E.: Efficient instance-based learning on data streams. Intell. Data Anal. 11(6), 627–650 (2007)
Article Google Scholar
Shaker, A., Hüllermeier, E.: Iblstreams: a system for instance-based classification and regression on data streams. Evolving Syst. 3(4), 235–249 (2012)
Article Google Scholar
Cesa-Bianchi, N., Conconi, A., Gentile, C.: On the generalization ability of on-line learning algorithms. IEEE Trans. Inf. Theory 50(9), 2050–2057 (2004)
Article MathSciNet MATH Google Scholar
Block, H.: The perceptron: a model for brain functioning. Rev. Mod. Phys. 34, 123–135 (1962)
Article MathSciNet MATH Google Scholar
Novikoff, A.B.: On convergence proofs for perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12, pp. 615–622 (1963)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
Book MATH Google Scholar
Crammer, K., Kandola, J., Holloway, R., Singer, Y.: Online classification on a budget. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2003)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for svm. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 807–814. ACM, New York (2007)
Google Scholar
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. IEEE Trans. Sig. Process. 52(8), 2165–2176 (2004)
Article MathSciNet MATH Google Scholar
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least squares algorithm. IEEE Trans. Sig. Process. 52, 2275–2285 (2003)
Article MathSciNet MATH Google Scholar
Csató, L., Opper, M.: Sparse on-line Gaussian processes. Neural Comput. 14(3), 641–668 (2002)
Article MATH Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
Article MATH Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)
Article MathSciNet MATH Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Book MATH Google Scholar
Sutskever, I.: A simpler unified analysis of budget perceptrons. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 14–18 June, pp. 985–992 (2009)
Google Scholar
Dekel, O., Shalev-Shwartz, S., Singer, Y.: The forgetron: a kernel-based perceptron on a budget. SIAM J. Comput. 37(5), 1342–1372 (2008)
Article MathSciNet MATH Google Scholar
Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: International Conference on Machine Learning (2008)
Google Scholar
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2009, p. 139 (2009)
Google Scholar
Žliobaite, I.: Learning under concept drift: an overview. CoRR abs/1010.4784 (2010)
Google Scholar
Lazarescu, M.M., Venkatesh, S., Bui, H.H.: Using multiple windows to track concept drift. Intell. Data Anal. 8(1), 29–59 (2004)
Article Google Scholar
Bifet, A., Gama, J., Pechenizkiy, M., Žliobaite, I.: Pakdd tutorial: Handling concept drift: Importance, challenges and solutions (2011)
Google Scholar
Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3, 157–195 (2003)
Google Scholar
Faria, E.R., Goncalves, I.J.C.R., Gama, J., Carvalho, A.C.P.L.F.: Evaluation methodology for multiclass novelty detection algorithms. In: Brazilian Conference on Intelligent Systems, BRACIS 2013, Fortaleza, CE, Brazil, 19–24 October, pp. 19–25 (2013)
Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Chapter Google Scholar
Baena-García, M., Del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams, vol. 6, pp. 77–86 (2006)
Google Scholar
Gama, J., Rodrigues, P.P., Sebastiao, R., Rodrigues, P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM, New York (2009)
Google Scholar
Page, E.: Continuous inspection schemes. Biometrika 41(1–2), 100 (1954)
Article MathSciNet MATH Google Scholar
Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.: Test of page-hinkley, an approach for fault detection in an agro-alimentary production system. In: 5th Asian Control Conference, vol. 2, pp. 815–818 (2004)
Google Scholar
Bondu, A., Boullé, M.: A supervised approach for change detection in data streams (2011)
Google Scholar
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Mach. Learn. 65(1), 131–165 (2006)
Article MATH Google Scholar
Minku, L., Yao, X.: DDD: a new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24, 619–633 (2012)
Article Google Scholar
Widmer, G., Kubat, M.: Learning flexible concepts from streams of examples: FLORA2. In: Proceedings of the 10th European Conference on Artificial Intelligence. Number section 5, pp. 463–467. Wiley (1992)
Google Scholar
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: SIGMOD, pp. 58–66 (2001)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM, New York (2001)
Google Scholar
Kolter, J., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the Third International IEEE Conference on Data Mining, pp. 123–130 (2003)
Google Scholar
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining, pp. 443–448 (2007)
Google Scholar
Jaber, G.: An approach for online learning in the presence of concept changes. Ph.D. thesis, Université AgroParisTech (France) (2013)
Google Scholar
Gama, J., Kosina, P.: Tracking recurring concepts with metalearners. In: Progress in Artificial Intelligence: 14th Portuguese Conference on Artificial Intelligence, p. 423 (2009)
Google Scholar
Gomes, J.B., Menasalvas, E., Sousa, P.A.C.: Tracking recurrent concepts using context. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 168–177. Springer, Heidelberg (2010)
Chapter Google Scholar
Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artif. Intell. Rev. 11(1), 133–155 (1997)
Article Google Scholar
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339 (2006)
Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Bifet, B., Gama, J., Gavalda, R., Krempl, G., Pechenizkiy, M., Pfahringer, B., Spiliopoulou, M., Žliobaite, I.: Advanced topics on data stream mining. Tutorial at the ECMLPKDD 2012 (2012)
Google Scholar
Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31, 1–38 (2004)
MathSciNet Google Scholar
Bifet, A., Read, J., Žliobaité, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS (LNAI), vol. 8188, pp. 465–479. Springer, Heidelberg (2013)
MATH Google Scholar
Žliobaité, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98, 455–482 (2015)
Article MathSciNet MATH Google Scholar
Dawid, A.: Present position and potential developments: some personal views: statistical theory: the prequential approach. J. Roy. Stat. Soc. Ser. A (General) 147, 278–292 (1984)
Article Google Scholar
Brzezinski, D., Stefanowski, J.: Prequential AUC for classifier evaluation and drift detection in evolving data streams. In: Proceedings of the Workshop New Frontiers in Mining Complex Patterns (NFMCP 2014) held in European Conference on Machine Learning (ECML) (2014)
Google Scholar
Bifet, A.: Adaptive learning and mining for data streams and frequent patterns. Ph.D. thesis, Universitat Politécnica de Catalunya (2009)
Google Scholar
Agrawal, R.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Article Google Scholar
Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. J. Univ. Comput. Sci. 11(8), 1353–1366 (2005)
Google Scholar
Bifet, A., Kirkby, R.: Data stream mining a practical approach. J. Empirical Finance 8(3), 325–342 (2009)
Google Scholar
Minku, L.L., White, A.P., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22(5), 730–742 (2010)
Article Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009)
Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl. Inf. Syst. 22(3), 371–391 (2010)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Žliobaité, I., Budka, M., Stahl, F.: Towards cost-sensitive adaptation: when is it worth updating your predictive model? Neurocomputing 150, 240–249 (2014)
Article Google Scholar
Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 299–310. Springer, Heidelberg (2010)
Chapter Google Scholar
Littlestone, N., Warmuth, M.: The weighted majority algorithm. In: 30th Annual Symposium on Foundations of Computer Science, pp. 256–261 (1989)
Google Scholar
Krempl, G., Žliobaite, I., Brzezinski, D., Hllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explorations (Special Issue on Big Data) 16, 1–10 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 2 Avenue Pierre Marzin, 22300, Lannion, France
Vincent Lemaire
EDF R&D, 1 Avenue du Général de Gaulle, 92140, Clamart, France
Christophe Salperwyck & Alexis Bondu

Authors

Vincent Lemaire
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Salperwyck
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Bondu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Lemaire .

Editor information

Editors and Affiliations

Université Libre de Bruxelles, Brussels, Belgium
Esteban Zimányi
Technische Universtät Berlin, Berlin, Germany
Ralf-Detlef Kutsche

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lemaire, V., Salperwyck, C., Bondu, A. (2015). A Survey on Supervised Classification on Data Streams. In: Zimányi, E., Kutsche, RD. (eds) Business Intelligence. eBISS 2014. Lecture Notes in Business Information Processing, vol 205. Springer, Cham. https://doi.org/10.1007/978-3-319-17551-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-17551-5_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17550-8
Online ISBN: 978-3-319-17551-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics