Mining Noisy Data Streams via a Discriminative Model

Chu, Fang; Wang, Yizhou; Zaniolo, Carlo

doi:10.1007/978-3-540-30214-8_4

Fang Chu²⁰,
Yizhou Wang²⁰ &
Carlo Zaniolo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

International Conference on Discovery Science

925 Accesses
4 Citations

Abstract

The two main challenges typically associated with mining data streams are concept drift and data contamination. To address these challenges, we seek learning techniques and models that are robust to noise and can adapt to changes in timely fashion. In this paper, we approach the stream-mining problem using a statistical estimation framework, and propose a discriminative model for fast mining of noisy data streams. We build an ensemble of classifiers to achieve adaptation by weighting classifiers in a way that maximizes the likelihood of the data. We further employ robust statistical techniques to alleviate the problem of noise sensitivity. Experimental results on both synthetic and real-life data sets demonstrate the effectiveness of this new discriminative model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

OEC: an online ensemble classifier for mining data streams with noisy labels

Article 12 December 2023

A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams

Online learning from capricious data streams via shared and new feature spaces

Article 16 July 2024

References

Aggarwal, C., Yu, P.: Outlier detection for high dimensional data. In: Int’l Conf. Management of Data, SIGMOD (2001)
Google Scholar
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
Google Scholar
Bilmes, J.: A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. In Technical Report ICSI-TR-97-021 (1998)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breunig, M., Kriegel, H., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Int’l Conf. Management of Data, SIGMOD (2000)
Google Scholar
Brodley, C., Friedl, M.: Identifying and eliminating mislabeled training instances. Artificial Intelligence, 799–805 (1996)
Google Scholar
Domeniconi, C., Gunopulos, D.: Incremental support vector machine construction. In: Int’l Conf. Data Mining, ICDM (2001)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer, Heidelberg (2000)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Int’l Conf. on Knowledge Discovery and Data Mining, SIGKDD (2001)
Google Scholar
Kolter, J., Maloof, M.: Dynamic weighted majority: A new ensemble method for tracking concept drift. In: Int’l Conf. Data Mining, ICDM (2001)
Google Scholar
Kubica, J., Moore, A.: Probabilistic noise identification and data cleaning. In: Int’l Conf. Data Mining, ICDM (2003)
Google Scholar
Oza, N.C., Russell, S.: Online bagging and boosting. Artificial Intelligence and Statistics, 105–112 (2001)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Int’l Conf. Management of Data, SIGMOD (2000)
Google Scholar
Schlimmer, J., Granger, F.: Beyond incremental processing: Tracking concept drift. In: Proc. of Int’l Conf. on Artificial Intelligence, pp. 502–507 (1986)
Google Scholar
Stolfo, S., Fan, W., Lee, W., Prodromidis, A., Chan, P.: Credit card fraud detection using meta-learning: Issues and initial results. In: AAAI 1997 Workshop on Fraud Detection and Risk Management (1997)
Google Scholar
Street, W., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Int’l Conf. on Knowledge Discovery and Data Mining, SIGKDD (2001)
Google Scholar
Wang, H., Fan, W., Yu, P., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Int’l Conf. on Knowledge Discovery and Data Mining, SIGKDD (2003)
Google Scholar
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Los Angeles, CA, 90095, USA
Fang Chu, Yizhou Wang & Carlo Zaniolo

Authors

Fang Chu
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Zaniolo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, 744 Motooka, Nishi, 819-0395, Fukuoka, Japan
Einoshin Suzuki
Kyushu University, 6–10–1 Hakozaki Higashi-ku, 812–8581, Fukuoka, Japan
Setsuo Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chu, F., Wang, Y., Zaniolo, C. (2004). Mining Noisy Data Streams via a Discriminative Model. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-30214-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics