Skip to main content

A Survey of Classification Methods in Data Streams

  • Chapter

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

With the advance in both hardware and software technologies, automated data generation and storage has become faster than ever. Such data is referred to as data streams. Streaming data is ubiquitous today and it is often a challenging task to store, analyze and visualize such rapid large volumes of data. Most conventional data mining techniques have to be adapted to run in a streaming environment, because of the underlying resource constraints in terms of memory and running time. Furthermore, the data stream may often show concept drift, because of which adaptation of conventional algorithms becomes more challenging. One such important conventional data mining problem is that of classification. In the classification problem, we attempt to model the class variable on the basis of one or more feature variables. While this problem has been extensively studied from a conventional mining perspective, it is a much more challenging problem in the data stream domain. In this chapter, we will re-visit the problem of classification from the data stream perspective. The techniques for this problem need to be thoroughly re-designed to address the issue of resource constraints and concept drift. This chapter reviews the state-of-the-art techniques in the literature along with their corresponding advantages and disadvantages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal C. (2003) A Framework for Diagnosing Changes in Evolving Data Streams. Proceedings of the ACM SIGMOD Conference.

    Google Scholar 

  2. Aggarwal C, Han J., Wang J., Yu P. S., (2003) A Framework for Clustering Evolving Data Streams, Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB’03), Berlin, Germany, Sept. 2003.

    Google Scholar 

  3. Aggarwal C, Han J., Wang J., Yu P. S., (2004) On Demand Classification of Data Streams, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD’ 04), Seattle, WA.

    Google Scholar 

  4. Babcock B., Babu S., Datar M., Motwani R., and Widom J. (2002) Models and issues in data stream systems. In Proceedings of PODS.

    Google Scholar 

  5. Babcock B., Datar M., and Motwani R. (2003) Load Shedding Techniques for Data Stream Systems (short paper) In Proc. of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003).

    Google Scholar 

  6. Burl M., Fowlkes C, Roden J., Stechert A., and Mukhtar S. (1999), Diamond Eye: A distributed architecture for image data mining, in SPIE DMKD, Orlando.

    Google Scholar 

  7. Cai Y. D., Clutter D., Pape G., Han J., Welge M., Auvil L. (2004) MAIDS: Mining Alarming Incidents from Data Streams. Proceedings of the 23rd ACM SIGMOD (International Conference on Management of Data).

    Google Scholar 

  8. Ding Q., Ding Q, and Perrizo W., (2002) Decision Tree Classification of Spatial Data Streams Using Peano Count Trees, Proceedings of the ACM 124 Symposium on Applied Computing, Madrid, Spain, pp. 413–417.

    Google Scholar 

  9. Domingos P. and Hulten G. (2000) Mining High-Speed Data Streams. In Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  10. Dong G., Han J., Lakshmanan L. V. S., Pei J., Wang H. and Yu P. S. (2003) Online mining of changes from data streams: Research problems and preliminary results, In Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams.

    Google Scholar 

  11. Fan W. (2004) Systematic data selection to mine concept-drifting data streams. ACM KDD Conference, pp. 128–137.

    Google Scholar 

  12. Ferrer-Troyano F. J., Aguilar-Ruiz J. S. and Riquelme J. C. (2004) Discovering Decision Rules from Numerical Data Streams, ACM Symposium on Applied Computing, pp. 649–653.

    Google Scholar 

  13. Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S. (2005) Mining Data Streams: A Review. ACM SIGMOD Record, Vol. 34, No. 1, June 2005, ISSN: 0163-5808.

    Google Scholar 

  14. Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., (2005). On-board Mining of Data Streams in Sensor Networks, Accepted as a chapter in the forthcoming book Advanced Methods of Knowledge Discovery from Complex Data, (Eds.) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane Cook, Springer Verlag, to appear.

    Google Scholar 

  15. Gama J., Rocha R. and Medas P. (2003), Accurate Decision Trees for Mining High-Speed Data Streams, Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  16. Garofalakis M., Gehrke J., Rastogi R. (2002) Querying and mining data streams: you only get one look a tutorial. SIGMOD Conference, 635.

    Google Scholar 

  17. Golab L. and Ozsu T. M. (2003) Issues in Data Stream Management. In SIGMOD Record, Volume 32, Number 2, pp. 5–14.

    Article  Google Scholar 

  18. Hand D. J. (1999) Statistics and Data Mining: Intersecting Disciplines ACM SIGKDD Explorations, 1,1, pp. 16–19.

    Article  Google Scholar 

  19. Hand D.J., Mannila H., and Smyth P. (2001) Principles of data mining, MIT Press.

    Google Scholar 

  20. Hastie T., Tibshirani R., Friedman J. (2001) The elements of statistical learning: data mining, inference, and prediction, New York: Springer.

    MATH  Google Scholar 

  21. Henzinger M., Raghavan P. and Rajagopalan S. (1998), Computing on data streams, Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA.

    Google Scholar 

  22. Hulten G., Spencer L., and Domingos P. (2001) Mining Time-Changing Data Streams. ACM SIGKDD Conference.

    Google Scholar 

  23. Jin R. and Agrawal G. (2003), Efficient Decision Tree Construction on Streaming Data, in Proceedings of ACM SIGKDD Conference.

    Google Scholar 

  24. Kargupta, H., Park, B., Pittie, S., Liu, L., Kushraj, D. and Sarkar, K. (2002). MobiMine: Monitoring the Stock Market from a PDA. ACM SIGKDD Explorations, Volume 3, Issue 2. Pages 37–46. ACM Press.

    Article  Google Scholar 

  25. Kargupta H., Bhargava R., Liu K., Powers M., Blair S., Bushra S., Dull J., Sarkar K., Klein M., Vasa M., and Handy D. (2004) VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring. Proceedings of SIAM International Conference on Data Mining.

    Google Scholar 

  26. Last M. (2002) Online Classification of Nonstationary Data Streams, Intelligent Data Analysis, Vol. 6, No. 2, pp. 129–147.

    MATH  MathSciNet  Google Scholar 

  27. Law Y., Zaniolo C. (2005) An Adaptive Nearest Neighbor Classification Algorithm for Data Streams, Proceedings of the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases, Springer Verlag, Porto, Portugal.

    Google Scholar 

  28. Muthukrishnan S. (2003) Data streams: algorithms and applications. Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms.

    Google Scholar 

  29. Park B. and Kargupta H. (2002) Distributed Data Mining: Algorithms, Systems, and Applications. To be published in the Data Mining Handbook. Editor: Nong Ye.

    Google Scholar 

  30. Wang H., Fan W., Yu P. and Han J. (2003) Mining Concept-Drifting Data Streams using Ensemble Classifiers, in the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA.

    Google Scholar 

  31. Wang K., Zhou S., Fu A., Yu J. (2003) Mining changes of classification by correspondence tracing. SIAM International Conference on Data Mining.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Gaber, M.M., Zaslavsky, A., Krishnaswamy, S. (2007). A Survey of Classification Methods in Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-47534-9_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28759-1

  • Online ISBN: 978-0-387-47534-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics