Skip to main content

Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels

  • Conference paper
Book cover Foundations of Intelligent Systems (ISMIS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5722))

Included in the following conference series:

Abstract

This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering 18(5), 577–589 (2006)

    Article  Google Scholar 

  2. Chen, S., Wang, H., Zhou, S., Yu, P.: Stop chasing trends: Discovering high order models in evolving data. In: Proc. ICDE, pp. 923–932 (2008)

    Google Scholar 

  3. Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, WA, USA, pp. 128–137 (2004)

    Google Scholar 

  4. Scholz, M., Klinkenberg., R.: An ensemble classifier for drifting concepts. In: Proc. Second International Workshop on Knowledge Discovery in Data Streams (IWKDDS), Porto, Portugal, October 2005, pp. 53–64 (2005)

    Google Scholar 

  5. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA, pp. 226–235. ACM, New York (2003)

    Chapter  Google Scholar 

  6. Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions for data streams. In: Proc. KDD, pp. 710–715 (2005)

    Google Scholar 

  7. Masud, M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: Proc. International Conference on Data Mining (ICDM), Pisa, Italy, December 15-19, pp. 929–934 (2008)

    Google Scholar 

  8. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, USA, pp. 71–80. ACM Press, New York (2000)

    Google Scholar 

  9. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), San Francisco, CA, USA, August 2001, pp. 97–106 (2001)

    Google Scholar 

  10. Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams. In: Proc. Seventh IEEE International Conference on Data Mining (ICDM), Omaha, NE, USA, October 2007, pp. 143–152 (2007)

    Google Scholar 

  11. Kolter, J., Maloof., M.: Using additive expert ensembles to cope with concept drift. In: Proc. International Conference on Machine Learning (ICML), Bonn, Germany, August 2005, pp. 449–456 (2005)

    Google Scholar 

  12. Bengio, Y., Delalleau, O., Le Roux, N.: Label propagation and quadratic criterion. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, pp. 193–216. MIT Press, Cambridge (2006)

    Google Scholar 

  13. Woolam, C., Khan, L.: Multi-label large margin hierarchical perceptron. IJDMMM 1(1), 5–22 (2008)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Woolam, C., Masud, M.M., Khan, L. (2009). Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds) Foundations of Intelligent Systems. ISMIS 2009. Lecture Notes in Computer Science(), vol 5722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04125-9_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04125-9_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04124-2

  • Online ISBN: 978-3-642-04125-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics