Efficiently Maintaining the Performance of an Ensemble Classifier in Streaming Data

Ryu, Joung Woo; Kantardzic, Mehmed M.; Kim, Myung-Won

doi:10.1007/978-3-642-32645-5_67

Joung Woo Ryu²⁰,
Mehmed M. Kantardzic²¹ &
Myung-Won Kim²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7425))

Included in the following conference series:

International Conference on Hybrid Information Technology

2364 Accesses
3 Citations

Abstract

In data stream environments, classifiers are generally refined based on regular time interval or fixed number of streaming data. Also, the correct labels of all unlabeled streaming data are typically used in the refine process. Such an approach is not feasible in many real world applications where data labeled by human experts should be used to improve classifiers. In this paper, we select data for refining a classifier from streaming data in an online process. Our selection methodology uses training data, and is applied to build an ensemble of classifiers over streaming data. We compared the results of our ensemble approach and of a conventional ensemble approach where new classifiers for an ensemble are periodically generated. In experiments with ten benchmark data sets including three real streaming data sets, our ensemble approach generated an average of 2.4% classifiers using an average of 10.0% labeled data for the conventional ensemble approach, and produced comparable classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Minku, L.L., Yao, X.: DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Transactions on Knowledge and Data Engineering (99) (2011), doi:10.1109/TKDE.2011.58
Google Scholar
Ryu, J.W., Kantardzic, M., Walgampaya, C.: Ensemble Classifier Based on Misclassified Streaming Data. In: Proc. of the 10th IASTED Int. Conf. on Artificial Intelligence and Applications, Austria, pp. 347–354 (2010)
Google Scholar
Gao, J., Fan, W., Han, J.: On Appropriate Assumptions to Mine Data Streams: Analysis and Practice. In: Proc. of the 7th IEEE ICDM, USA, pp. 143–152 (2007)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: Proc. of the 9th ACM SIGKDD KDD, USA, pp. 226–235 (2003)
Google Scholar
Chu, F., Zaniolo, C.: Fast and Light Boosting for Adaptive Mining of Data Streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)
Chapter Google Scholar
Zhang, P., Zhu, X., Shi, Y.: Categorizing and Mining Concept Drifting Data Streams. In: Proc. of the 14th ACM SIGKDD, USA, pp. 812–820 (2008)
Google Scholar
Zhang, P., Zhu, X., Shi, Y., Wu, X.: An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 1021–1029. Springer, Heidelberg (2009)
Chapter Google Scholar
Wei, Q., Yang, Z., Junping, Z., Youg, W.: Mining Multi-Label Concept-Drifting Data Streams Using Ensemble Classifiers. In: Proc. of the 6th FSKD, China, pp. 275–279 (2009)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: Proc. of the 9th ACM SIGKDD, USA, pp. 226–235 (2003)
Google Scholar
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data. In: ICDM, Pisa, Italy, pp. 929–934 (2008)
Google Scholar
Woolam, C., Masud, M.M., Khan, L.: Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 552–562. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Technical Research Center, Safetia Inc., Seoul, 137-895, South Korea
Joung Woo Ryu
CECS Department, Speed School of Engineering, University of Louisville, KY, 40292, USA
Mehmed M. Kantardzic
Department of Computer Science, Soongsil University, Seoul, 156-743, South Korea
Myung-Won Kim

Authors

Joung Woo Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Mehmed M. Kantardzic
View author publications
You can also search for this author in PubMed Google Scholar
Myung-Won Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept of Computer Engineering, Hannam University, Korea
Geuk Lee
Computer Science and Information System, University of Limerick, Limerick, Ireland
Daniel Howard
Department of Information and Communication, Dong Seoul University, 423 Bokjeong-Dong, Sujeong-Gu, Seongnam, Gyunggi, Korea
Jeong Jin Kang
Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097, Warsaw, Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryu, J.W., Kantardzic, M.M., Kim, MW. (2012). Efficiently Maintaining the Performance of an Ensemble Classifier in Streaming Data. In: Lee, G., Howard, D., Kang, J.J., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2012. Lecture Notes in Computer Science, vol 7425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32645-5_67

Download citation

DOI: https://doi.org/10.1007/978-3-642-32645-5_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32644-8
Online ISBN: 978-3-642-32645-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics