Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels

Woolam, Clay; Masud, Mohammad M.; Khan, Latifur

doi:10.1007/978-3-642-04125-9_58

Clay Woolam²³,
Mohammad M. Masud²³ &
Latifur Khan²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5722))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1271 Accesses
14 Citations

Abstract

This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering 18(5), 577–589 (2006)
Article Google Scholar
Chen, S., Wang, H., Zhou, S., Yu, P.: Stop chasing trends: Discovering high order models in evolving data. In: Proc. ICDE, pp. 923–932 (2008)
Google Scholar
Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, WA, USA, pp. 128–137 (2004)
Google Scholar
Scholz, M., Klinkenberg., R.: An ensemble classifier for drifting concepts. In: Proc. Second International Workshop on Knowledge Discovery in Data Streams (IWKDDS), Porto, Portugal, October 2005, pp. 53–64 (2005)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA, pp. 226–235. ACM, New York (2003)
Chapter Google Scholar
Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions for data streams. In: Proc. KDD, pp. 710–715 (2005)
Google Scholar
Masud, M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: Proc. International Conference on Data Mining (ICDM), Pisa, Italy, December 15-19, pp. 929–934 (2008)
Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, USA, pp. 71–80. ACM Press, New York (2000)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), San Francisco, CA, USA, August 2001, pp. 97–106 (2001)
Google Scholar
Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams. In: Proc. Seventh IEEE International Conference on Data Mining (ICDM), Omaha, NE, USA, October 2007, pp. 143–152 (2007)
Google Scholar
Kolter, J., Maloof., M.: Using additive expert ensembles to cope with concept drift. In: Proc. International Conference on Machine Learning (ICML), Bonn, Germany, August 2005, pp. 449–456 (2005)
Google Scholar
Bengio, Y., Delalleau, O., Le Roux, N.: Label propagation and quadratic criterion. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, pp. 193–216. MIT Press, Cambridge (2006)
Google Scholar
Woolam, C., Khan, L.: Multi-label large margin hierarchical perceptron. IJDMMM 1(1), 5–22 (2008)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at Dallas, USA
Clay Woolam, Mohammad M. Masud & Latifur Khan

Authors

Clay Woolam
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad M. Masud
View author publications
You can also search for this author in PubMed Google Scholar
Latifur Khan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics and Statistics, University of Economics, W. Churchill Sq. 4, 130 67, Prague 3, Czech Republic
Jan Rauch
Department of Computer Science, University of North Carolina, NC 27599-3175, Charlotte, USA
Zbigniew W. Raś
Faculty of Informatics and Statics, University of Economics, W. Churchill Sq. 4, 130 67, Prague, Czech Republic
Petr Berka
Institute of Software Systems, Tampere University of Technology, P. O. Box 553, 33101, Tampere, Finland
Tapio Elomaa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Woolam, C., Masud, M.M., Khan, L. (2009). Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds) Foundations of Intelligent Systems. ISMIS 2009. Lecture Notes in Computer Science(), vol 5722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04125-9_58

Download citation

DOI: https://doi.org/10.1007/978-3-642-04125-9_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04124-2
Online ISBN: 978-3-642-04125-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics