Abstract
The problem addressed in this paper concerns learning form data streams with concept drift. The goal of the paper is to propose a framework for the online learning. It is assumed that classifiers are induced from incoming blocks of prototypes, called data chunks. Eachdata chunk consists of prototypes including also information as to whether the class prediction of these instances was correct or not. When a new data chunk is formed, classifier ensembles formed at an earlier stage are updated. Three online learning algorithms for performing machine learning on data streams based on three different prototype selection approaches to forming data chunks are considered. The proposed approach is validated experimentally and the computational experiment results are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based Learning Algorithms. Machine Learning 6, 37–66 (1991)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams (2006)
Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SDM. SIAM (2007)
Bifet, A., Gavaldà, R.: Kalman filters and adaptive windows for learning in data streams. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 29–40. Springer, Heidelberg (2006)
Bifet, A.: Adaptive learning and mining for data streams and frequent patterns. PhD thesis, Universitat Politecnica de Catalunya (2009)
Caragea, D., Silvescu, A., Honavar, V.: Agents That Learn from Distributed Dynamic Data Sources. In: ECML 2000/Agents 2000 Workshop on Learning Agents, Barcelona, Spain (2000)
Chaudhuri, S., Motwani, R., Narasayya, V.R.: On random sampling over joins. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD Conference, pp. 263–274. ACM Press (1999)
Chen, H., Gong, Y., Hong, X., Chen, S.: A Fast Adaptive Tunable RBF Network For Non-stationary Systems. Transactions on Systems, Man, and Cybernetics, Part B (2012) (to appear)
Cichosz, P.: Systemy uczące się. Wydawnictwo Naukowo-Techniczne, Warszawa (2000) (in Polish)
Cohen, E., Strauss, M.J.: Maintaining time-decaying stream aggregates. Journal of Algorithms 59(1), 19–36 (2006)
Datasets used for classification: comparison of results. Directory of Data Sets, http://www.is.umk.pl/projects/datasets.html (accessed September 1, 2009)
Deckert, M., Stefanowski, J.: Comparing Block Ensembles for Data Streams with Concept Drift. In: Pechenizkiy, M., Wojciechowski, M., (eds.) New Trends in Databases & Inform. Syst. AISC, vol. 185, pp. 69–78. Springer, Heidelberg (2012)
Fan, W., Stolfo, S.J., Zhang, J.: The application of AdaBoost for distributed, scalable and on-line learning. In: KDD 1999, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 362–366. ACM, New York (1999)
Hart, P.E.: The Condensed Nearest Neighbour Rule. IEEE Transactions on Information Theory 14, 515–516 (1968)
Jędrzejowicz, J., Jędrzejowicz, P.: Online Classifiers Based on Fuzzy C-means Clustering. In: Bǎdicǎ, C., Nguyen, N.T., Brezovan, M. (eds.) ICCCI 2013. LNCS (LNAI), vol. 8083, pp. 427–436. Springer, Heidelberg (2013)
Jędrzejowicz, J., Jędrzejowicz, P.: Cellular GEP-Induced Classifiers. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010, Part I. LNCS, vol. 6421, pp. 343–352. Springer, Heidelberg (2010)
Klinkenberg, R.: Learning Drifting Concepts: Example Selection vs. Example Weighting. Intelligent Data Analysis. Incremental Learning Systems Capable of Dealing with Concept Drift 8(3), 281–300 (2004)
Kubat, M., Widmer, G.: Adapting to drift in continuous domains. Tech. Report ÖFAI-TR-94-27, Austrian Research Institute for Artificial Intelligence, Vienna (1994)
Kuncheva, L., Whitaker: Measures of diversity in classifier ensembles. Machine Learning 51, 181–207 (2003)
Kuncheva, L.I.: Classifier ensembles for changing environments. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 1–15. Springer, Heidelberg (2004)
Last, M.: Online Classification of Nonstationary Data Streams. Intelligent Data Analysis 6(2), 129–147 (2002)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, SanMateo (1993)
Sahel, Z., Bouchachia, A., Gabrys, B., Rogers, P.: Adaptive Mechanisms for Classification Problems with Drifting Data. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part II. LNCS (LNAI), vol. 4693, pp. 419–426. Springer, Heidelberg (2007)
Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. AI Review, Special Issue on Lazy Learning 11(1-5), 133–155 (1997)
Shalev-Shwartz, S.: Online learning: Theory, Algorithms, and Applications. PhD thesis (2007)
Stefanowski, J.: Multiple and Hybrid Classifiers. In: Polkowski, L. (ed.) Formal Methods and Intelligent Techniques in Control, Decision Making. Multimedia and Robotics, Warszawa, pp. 174–188 (2001)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: FOCS, pp. 359–366 (2000)
Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering Classifiers for Knowledge Discovery from Physically Distributed Databases. Data & Knowledge Engineering 49, 223–242 (2004)
Tsymbal, A.: The Problem of Concept Drift: Definitions and Related work. Tech. Rep. TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Dublin, Ireland (2004)
Venkatesh, G., Gehrke, J., Ramakrishnan, R.: Mining Data Streams under Block Evolution. SIGKDD Explorations 3(2), 1–10 (2002)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Software 11(1), 37–57 (1985)
Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23(1), 69–101 (1996)
Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-based Learning Algorithm. Machine Learning 33(3), 257–286 (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufman, San Francisco (2005)
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active Learning from Data Streams. In: Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 757–762 (2007)
Zliobaite, I.: Adaptive training set formation. PhD thesis, Vilnius University, Vilnius (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Czarnowski, I., Jędrzejowicz, P. (2014). Online Learning Based on Prototypes. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8398. Springer, Cham. https://doi.org/10.1007/978-3-319-05458-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-05458-2_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05457-5
Online ISBN: 978-3-319-05458-2
eBook Packages: Computer ScienceComputer Science (R0)