Skip to main content

An Efficient Algorithm for Instance-Based Learning on Data Streams

  • Conference paper
Advances in Data Mining. Theoretical Aspects and Applications (ICDM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4597))

Included in the following conference series:

Abstract

The processing of data streams in general and the mining of such streams in particular have recently attracted considerable attention in various research fields. A key problem in stream mining is to extend existing machine learning and data mining methods so as to meet the increased requirements imposed by the data stream scenario, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions, amongst others. This paper considers the problem of classification on data streams and develops an instance-based learning algorithm for that purpose. The experimental studies presented in the paper suggest that this algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) Databases, Information Systems, and Peer-to-Peer Computing. LNCS, vol. 2944, Springer, Heidelberg (2004)

    Google Scholar 

  2. Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publishers, Dordrecht (1997)

    MATH  Google Scholar 

  3. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, Madison, Wisconsin, pp. 1–16. ACM Press, New York (2002)

    Google Scholar 

  5. Ben-David, S., Gehrke, J., Kifer, D.: Detecting change in data streams. In: Proc. VLDB 2004 (2004)

    Google Scholar 

  6. Bercken, J., Blohsfeld, B., Dittrich, J., Krämer, J., Schäfer, T., Schneider, M., Seeger, B.: XXL - a library approach to supporting effcient implementations of advanced database queries. In: Proceedings of the VLDB, pp. 39–48 (2001)

    Google Scholar 

  7. Ciaccia, P., Patella, M., Rabitti, F., Zezula, P.: Indexing metric spaces with M-tree. In: Proc. SEBD 1997, Verona, Italy, June 1997, pp. 67–86 (1997)

    Google Scholar 

  8. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. In: Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, pp. 296–306. ACM Press, New York (2003)

    Google Scholar 

  9. Dasarathy, B.V. (ed.): Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  10. Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24, 141–168 (1996)

    Google Scholar 

  12. Domingos, P., Hulten, G.: A general framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003)

    Google Scholar 

  13. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record 34(1) (2005)

    Google Scholar 

  14. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Google Scholar 

  15. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Preneel, B., Tavares, S. (eds.) SAC 2005, pp. 573–577. ACM Press, New York (2005)

    Chapter  Google Scholar 

  16. Golab, L., Tamer, M.: Issues in data stream management. SIGMOD Rec. 32(2), 5–14 (2003)

    Article  Google Scholar 

  17. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 97–106. ACM Press, New York (2001)

    Chapter  Google Scholar 

  18. Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 2002, pp. 102–111. ACM Press, New York (2002)

    Google Scholar 

  19. Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Proc. ICML, 17th Int. Conf. on Machine Learning, San Francisco, CA, pp. 487–494 (2000)

    Google Scholar 

  20. Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift 8(3), 281–300 (2004)

    Google Scholar 

  21. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: A new ensemble method for tracking concept drift. Technical Report CSTR-20030610-3, Department of Computer Science, Georgetown University, Washington, DC (June 2003)

    Google Scholar 

  22. Kubat, M., Widmer, G.: Adapting to drift in continuous domains. In: Lavrač, N., Wrobel, S. (eds.) Machine Learning: ECML-95. LNCS, vol. 912, p. 307. Springer, Heidelberg (1995)

    Google Scholar 

  23. Law, Y.N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  24. McKenna, E., Smyth, B.: Competence-guided editing methods for lazy learning. In: ECAI, pp. 60–64 (2000)

    Google Scholar 

  25. Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artif. Intell. Rev. 11(1-5), 133–155 (1997)

    Article  Google Scholar 

  26. Stanfil, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)

    Article  Google Scholar 

  27. Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)

    Google Scholar 

  28. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 226–235. ACM Press, New York (2003)

    Chapter  Google Scholar 

  29. Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: Brazdil, P.B. (ed.) Machine Learning: ECML-93. LNCS, vol. 667, pp. 227–243. Springer, Heidelberg (1993)

    Google Scholar 

  30. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  31. Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Beringer, J., Hüllermeier, E. (2007). An Efficient Algorithm for Instance-Based Learning on Data Streams. In: Perner, P. (eds) Advances in Data Mining. Theoretical Aspects and Applications. ICDM 2007. Lecture Notes in Computer Science(), vol 4597. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73435-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73435-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73434-5

  • Online ISBN: 978-3-540-73435-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics