Classifying Noisy Data Streams

Wang, Yong; Li, Zhanhuai; Zhang, Yang

doi:10.1007/11881599_65

Yong Wang²³,
Zhanhuai Li²³ &
Yang Zhang²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4223))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

1270 Accesses
3 Citations

Abstract

The two main challenges associated with mining data streams are concept drifting and data noise. Current algorithms mainly depend on the robust of the base classifier or learning ensembles, and have no active mechanisms to deal noisy. However, noise still can induce the drastic drops in accuracy. In this paper, we present a clustering-based method to filter out hard instances and noise instances from data streams. We also propose a trigger to detect concept drifting and build RobustBoosting, an ensemble classifier, by boosting the hard instances. We evaluated RobustBoosting algorithm and AdaptiveBoosting algorithm [1] on the synthetic and real-life data sets. The experiment results show that the proposed method has substantial advantage over AdaptiveBoosting algorithm in prediction accuracy, and that it can converge to target concepts efficiently with high accuracy on datasets with noise level as high as 40%.

This research is supported by NSF 60373108.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data

Article 20 April 2021

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

Article 27 March 2020

STDS: self-training data streams for mining limited labeled data in non-stationary environment

Article 21 January 2020

References

Chu, F., Zaniolo, C.: Fast and Light Boosting for Adaptive Mining of Data Streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, Springer, Heidelberg (2004)
Chapter Google Scholar
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proc. of the Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining Time-changing Data Streams. In: ACM SIGKDD (2001)
Google Scholar
Last, M.: Online Classification of Nonstationary Data Stream. Intelligent Data Analysis 6, 129–147 (2002)
MATH Google Scholar
Kuncheva, L.I.: Classifier Ensembles for Changing Environments. In: Proc. 5th Int. Workshop on Multiple Classifier Systems, pp. 1–15 (2004)
Google Scholar
Street, W., Kim, Y.: A Streaming Ensemble Algorithm(sea) for Large–Scale Classification. In: Int’l. Conf. on Knowledge Discovery and Data Mining (2001)
Google Scholar
Wang, H., Wei Fan, P., Yu, J.: Han: Mining Concept-Drifting Data Streams Using Ensemble Classifiers. In: Int’l. Conf. on Knowledge Discovery and Data Mining (2003)
Google Scholar
Kubica, J., Moore, A.: Probabilistic Noise Identification and Data Cleaning. In: Int’l. Conf. Data Mining (2003)
Google Scholar
Zhu, X., Wu, X., Chen, Q.: Eliminating Class Noise in Large Datasets. In: The Proc. of the 20th International Conf. on Maching Learning (2003)
Google Scholar
Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine learning 23, 69–101 (1996)
Google Scholar
Fan, W.: Systematic Data Selection to Mine Concept-Drifting Data Streams. In: The Proceeding of the Conf. KDD, pp. 128–137 (2004)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Database with Noise. In: Proc. of Int. Conf. on Knowledge Discovering and Data Mining, pp. 226–231 (1996)
Google Scholar
Kalai, A., Rocco, A.: Servedio: Boosting in the Presence of Noise. Journal of Computer and System Science 71(3), 226–290 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Computer Science & Software, Northwestern Polytechnical University, P.R. China
Yong Wang & Zhanhuai Li
School of Information Engineering, Northwest A&F University, P.R. China
Yang Zhang

Authors

Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhanhuai Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University,, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Life Science Research Center, School of Electronic Engineering, Xidian University,, 710071, Xi’an, Shaanxi, China
Licheng Jiao
School of Electrical and Electronic Engineering, Xidian University, 710071, Xi’an, China
Guanming Shi
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, Queensland, Australia
Xue Li
College of Mathematics and Information Science, Hebei Normal University, 050016, Shijiazhuang, Hebei, P.R. China
Jing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Li, Z., Zhang, Y. (2006). Classifying Noisy Data Streams. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_65

Download citation

DOI: https://doi.org/10.1007/11881599_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Classifying Noisy Data Streams

Abstract

Access this chapter

Preview

Similar content being viewed by others

Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

STDS: self-training data streams for mining limited labeled data in non-stationary environment

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Classifying Noisy Data Streams

Abstract

Access this chapter

Preview

Similar content being viewed by others

Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

STDS: self-training data streams for mining limited labeled data in non-stationary environment

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation