An Efficient Algorithm for Instance-Based Learning on Data Streams

Beringer, Jürgen; Hüllermeier, Eyke

doi:10.1007/978-3-540-73435-2_4

Jürgen Beringer¹ &
Eyke Hüllermeier²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4597))

Included in the following conference series:

Industrial Conference on Data Mining

761 Accesses
9 Citations

Abstract

The processing of data streams in general and the mining of such streams in particular have recently attracted considerable attention in various research fields. A key problem in stream mining is to extend existing machine learning and data mining methods so as to meet the increased requirements imposed by the data stream scenario, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions, amongst others. This paper considers the problem of classification on data streams and develops an instance-based learning algorithm for that purpose. The experimental studies presented in the paper suggest that this algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) Databases, Information Systems, and Peer-to-Peer Computing. LNCS, vol. 2944, Springer, Heidelberg (2004)
Google Scholar
Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publishers, Dordrecht (1997)
MATH Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, Madison, Wisconsin, pp. 1–16. ACM Press, New York (2002)
Google Scholar
Ben-David, S., Gehrke, J., Kifer, D.: Detecting change in data streams. In: Proc. VLDB 2004 (2004)
Google Scholar
Bercken, J., Blohsfeld, B., Dittrich, J., Krämer, J., Schäfer, T., Schneider, M., Seeger, B.: XXL - a library approach to supporting effcient implementations of advanced database queries. In: Proceedings of the VLDB, pp. 39–48 (2001)
Google Scholar
Ciaccia, P., Patella, M., Rabitti, F., Zezula, P.: Indexing metric spaces with M-tree. In: Proc. SEBD 1997, Verona, Italy, June 1997, pp. 67–86 (1997)
Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. In: Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, pp. 296–306. ACM Press, New York (2003)
Google Scholar
Dasarathy, B.V. (ed.): Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)
Chapter Google Scholar
Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24, 141–168 (1996)
Google Scholar
Domingos, P., Hulten, G.: A general framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003)
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record 34(1) (2005)
Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Google Scholar
Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Preneel, B., Tavares, S. (eds.) SAC 2005, pp. 573–577. ACM Press, New York (2005)
Chapter Google Scholar
Golab, L., Tamer, M.: Issues in data stream management. SIGMOD Rec. 32(2), 5–14 (2003)
Article Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 97–106. ACM Press, New York (2001)
Chapter Google Scholar
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 2002, pp. 102–111. ACM Press, New York (2002)
Google Scholar
Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Proc. ICML, 17th Int. Conf. on Machine Learning, San Francisco, CA, pp. 487–494 (2000)
Google Scholar
Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift 8(3), 281–300 (2004)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: A new ensemble method for tracking concept drift. Technical Report CSTR-20030610-3, Department of Computer Science, Georgetown University, Washington, DC (June 2003)
Google Scholar
Kubat, M., Widmer, G.: Adapting to drift in continuous domains. In: Lavrač, N., Wrobel, S. (eds.) Machine Learning: ECML-95. LNCS, vol. 912, p. 307. Springer, Heidelberg (1995)
Google Scholar
Law, Y.N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, Springer, Heidelberg (2005)
Chapter Google Scholar
McKenna, E., Smyth, B.: Competence-guided editing methods for lazy learning. In: ECAI, pp. 60–64 (2000)
Google Scholar
Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artif. Intell. Rev. 11(1-5), 133–155 (1997)
Article Google Scholar
Stanfil, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)
Article Google Scholar
Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 226–235. ACM Press, New York (2003)
Chapter Google Scholar
Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: Brazdil, P.B. (ed.) Machine Learning: ECML-93. LNCS, vol. 667, pp. 227–243. Springer, Heidelberg (1993)
Google Scholar
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,
Jürgen Beringer
Fachbereich Mathematik und Informatik, Philipps-Universität Marburg,
Eyke Hüllermeier

Authors

Jürgen Beringer
View author publications
You can also search for this author in PubMed Google Scholar
Eyke Hüllermeier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beringer, J., Hüllermeier, E. (2007). An Efficient Algorithm for Instance-Based Learning on Data Streams. In: Perner, P. (eds) Advances in Data Mining. Theoretical Aspects and Applications. ICDM 2007. Lecture Notes in Computer Science(), vol 4597. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73435-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-73435-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73434-5
Online ISBN: 978-3-540-73435-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics