Effective classification of noisy data streams with attribute-oriented dynamic classifier selection

Zhu, Xingquan; Wu, Xindong; Yang, Ying

doi:10.1007/s10115-005-0212-y

Effective classification of noisy data streams with attribute-oriented dynamic classifier selection

Regular Paper
Published: 09 September 2005

Volume 9, pages 339–363, (2006)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xingquan Zhu¹,
Xindong Wu¹ &
Ying Yang²

187 Accesses
26 Citations
Explore all metrics

Abstract

Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or have low confidence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Streams Classification: A Selective Ensemble with Adaptive Behavior

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

Article 28 April 2015

ACCD: Associative Classification over Concept-Drifting Data Streams

References

Ali K, Pazzani M (1996) Error reduction through learning multiple description. Machine Learning 24(3)
Blake C, Merz L (1998) UCI data repository
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmont, CA
Breiman L (1996) Stacked regressions. Machine Learning 24
Brodley CE (1995) Recursive automatic bias selection for classifier construction. Machine Learning 20: 63–94
Google Scholar
Brodley CE, Friedl MA, Strahler AH (1996) New approaches to decision tree classification in remote sensing: Using homogeneous and hybrid decision trees to map land cover. In: Proc. of international Geosci. remote Sens. Symp., vol 1. Lincoln, Neb.,pp 532–534
Chan P (1996) An extensible meta-learning approach for scalable and accurate inductive learning. Ph.D thesis, Columbia Univ.
Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. PAMI 17(7)
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proc. of SIGKDD
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proc. of the international conference on Machine Learning, pp 194–202
Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Machine Learning 11
Huang YS, Suen CY (1995) A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. PAMI 17(1): 90–94
Google Scholar
IBM Almaden Research, Synthetic data generator. http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html
Jain AK, Dubes RC (1998) Algorithms for clustering data. Prentice Hall
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artificial Intelligence Journal, Special Issue on Relevance 97(1/2): 273–324
MATH Google Scholar
Kolter J, Maloof M (2003) Dynamic weighted majority: A new ensemble method for tracking concept drift. In: Proc. of ICDM
Kucheva LI (2002) Switching between selection and fusion in combining classifiers: An experiment. IEEE Trans. SMC 32(2)
Google Scholar
Kuncheva LI, Whitaker CA, Shipp CA, Duin R (2000) Is independence good for combining classifiers? In: Proc. of 15th Int. conference on pattern recognition. Spain
Li YH, Jain AK (1998) Classification of text document. The Computer Journal 41(8)
Linde ABY, Gray RM (1980) An algorithm for vector quantization design. IEEE Trans. Communications COM-28: 84–95
Article Google Scholar
Merz CJ (1996) Dynamical selection of learning algorithms. In: Fisher D, Lenz H-J (eds) Learning from data, artificial intelligence and statistics. Springer-Verlag, NY
Merz CJ (1999) Using correspondence analysis to combine classifiers. Machine Learning 36(1/2): 33–58
Article Google Scholar
Nasraoui O, Cardona C, Rojas C, Gonzlez F (2003) TECNO-STREAMS: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proc. of ICDM
Oh SB (2003) On the relationship between majority vote accuracy and dependency in multiple classifier systems. Pattern Recognition Letters 24: 359–363
Article MATH Google Scholar
Ortega J, Koppel M, Argamon S (2001) Arbitrating among competing classifiers using learned referees. Knowledge and Information Systems 3(4)
Puuronen S, Tsymbal A (2001) Local feature selection with dynamic integration of classifiers. Fundamenta Informaticae 47: 91–117
MATH MathSciNet Google Scholar
Quinlan R (1993) C4.5 programs for machine learning. San Mateo, CA, Morgan Kaufmann Publisher
Google Scholar
Schapire R (1990) The strength of weak learnability. Machine Learning 5(2): 197–227
Google Scholar
Schaffer C (1993) Selecting a classification method by cross-validation. Machine Learning 13: 135–143
Google Scholar
Smits PC (2002) Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection. IEEE Trans. Geosci. Remote Sensing 40(4)
Tsymbal A, Puuronen S (2000) Bagging and Boosting with dynamic integration of classifier. In: Proc. of PKDD 2000, Lyon, France. Lecture Notes in artificial intelligence, vol 1910. Springer Verlag, pp 116–125
Ueda N (2000) Optimal linear combination of neural networks for improving classification perform. IEEE Trans. PAMI 22
Verikas A, Lipnickas A, Malmqvist K, Bacauskiene M, Gelzinis A (1999) Soft combination of neural classifiers: A comparative study. Pattern Recognition Letter 20(4):429–444
Article Google Scholar
Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proc. of KDD
Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimation. IEEE Trans. PAMI 19(4)
Xu L, Krzyak A, Suen C (1992) Methods of combining multiple classifiers and their application to handwriting recognition. IEEE Trans. Sys, Man and Cyber 22
Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Prof. of 20th ICML Conf. Washington DC
Zhu X, Wu X (2004) Class noise vs attribute noise: A quantitative study of their impacts. Artificial Intelligence Review 22(3/4): 177–210
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA
Xingquan Zhu & Xindong Wu
School of Computer Science and Software Engineering, Monash University, Melbourne, Victoria, Australia
Ying Yang

Authors

Xingquan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

A preliminary version of this paper was published in the Proceedings of the 4th IEEE International Conference on Data Mining, pp 305–312, Brighton, UK

Xingquan Zhu received his Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent four months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN. He is currently a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT. His research interests include Data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 40 refereed papers in various journals and conference proceedings.

Xindong Wu is a Professor and the Chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, ICML, KDD, ICDM, and WWW, as well as 11 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner.

Ying Yang received her Ph.D. in Computer Science from Monash University, Australia in 2003. Following academic appointments at the University of Vermont, USA, she currently holds a Research Fellow at Monash University, Australia. Dr. Yang is recognized for contributions in the fields of machine learning and data mining. She has published many scientific papers and book chapters on adaptive learning, proactive mining, noise cleansing and discretization. Contact her at yyang@mail.csse.monash.edu.au.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Wu, X. & Yang, Y. Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9, 339–363 (2006). https://doi.org/10.1007/s10115-005-0212-y

Download citation

Received: 01 November 2004
Revised: 27 January 2005
Accepted: 19 February 2005
Published: 09 September 2005
Issue Date: March 2006
DOI: https://doi.org/10.1007/s10115-005-0212-y

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective classification of noisy data streams with attribute-oriented dynamic classifier selection

Abstract

Access this article

Similar content being viewed by others

Data Streams Classification: A Selective Ensemble with Adaptive Behavior

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

ACCD: Associative Classification over Concept-Drifting Data Streams

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Effective classification of noisy data streams with attribute-oriented dynamic classifier selection

Abstract

Access this article

Similar content being viewed by others

Data Streams Classification: A Selective Ensemble with Adaptive Behavior

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

ACCD: Associative Classification over Concept-Drifting Data Streams

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation