Skip to main content
Log in

Effective classification of noisy data streams with attribute-oriented dynamic classifier selection

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or have low confidence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ali K, Pazzani M (1996) Error reduction through learning multiple description. Machine Learning 24(3)

  2. Blake C, Merz L (1998) UCI data repository

  3. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmont, CA

  4. Breiman L (1996) Stacked regressions. Machine Learning 24

  5. Brodley CE (1995) Recursive automatic bias selection for classifier construction. Machine Learning 20: 63–94

    Google Scholar 

  6. Brodley CE, Friedl MA, Strahler AH (1996) New approaches to decision tree classification in remote sensing: Using homogeneous and hybrid decision trees to map land cover. In: Proc. of international Geosci. remote Sens. Symp., vol 1. Lincoln, Neb.,pp 532–534

  7. Chan P (1996) An extensible meta-learning approach for scalable and accurate inductive learning. Ph.D thesis, Columbia Univ.

  8. Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. PAMI 17(7)

  9. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proc. of SIGKDD

  10. Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proc. of the international conference on Machine Learning, pp 194–202

  11. Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Machine Learning 11

  12. Huang YS, Suen CY (1995) A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. PAMI 17(1): 90–94

    Google Scholar 

  13. IBM Almaden Research, Synthetic data generator. http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html

  14. Jain AK, Dubes RC (1998) Algorithms for clustering data. Prentice Hall

  15. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artificial Intelligence Journal, Special Issue on Relevance 97(1/2): 273–324

    MATH  Google Scholar 

  16. Kolter J, Maloof M (2003) Dynamic weighted majority: A new ensemble method for tracking concept drift. In: Proc. of ICDM

  17. Kucheva LI (2002) Switching between selection and fusion in combining classifiers: An experiment. IEEE Trans. SMC 32(2)

    Google Scholar 

  18. Kuncheva LI, Whitaker CA, Shipp CA, Duin R (2000) Is independence good for combining classifiers? In: Proc. of 15th Int. conference on pattern recognition. Spain

  19. Li YH, Jain AK (1998) Classification of text document. The Computer Journal 41(8)

  20. Linde ABY, Gray RM (1980) An algorithm for vector quantization design. IEEE Trans. Communications COM-28: 84–95

    Article  Google Scholar 

  21. Merz CJ (1996) Dynamical selection of learning algorithms. In: Fisher D, Lenz H-J (eds) Learning from data, artificial intelligence and statistics. Springer-Verlag, NY

  22. Merz CJ (1999) Using correspondence analysis to combine classifiers. Machine Learning 36(1/2): 33–58

    Article  Google Scholar 

  23. Nasraoui O, Cardona C, Rojas C, Gonzlez F (2003) TECNO-STREAMS: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proc. of ICDM

  24. Oh SB (2003) On the relationship between majority vote accuracy and dependency in multiple classifier systems. Pattern Recognition Letters 24: 359–363

    Article  MATH  Google Scholar 

  25. Ortega J, Koppel M, Argamon S (2001) Arbitrating among competing classifiers using learned referees. Knowledge and Information Systems 3(4)

  26. Puuronen S, Tsymbal A (2001) Local feature selection with dynamic integration of classifiers. Fundamenta Informaticae 47: 91–117

    MATH  MathSciNet  Google Scholar 

  27. Quinlan R (1993) C4.5 programs for machine learning. San Mateo, CA, Morgan Kaufmann Publisher

    Google Scholar 

  28. Schapire R (1990) The strength of weak learnability. Machine Learning 5(2): 197–227

    Google Scholar 

  29. Schaffer C (1993) Selecting a classification method by cross-validation. Machine Learning 13: 135–143

    Google Scholar 

  30. Smits PC (2002) Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection. IEEE Trans. Geosci. Remote Sensing 40(4)

  31. Tsymbal A, Puuronen S (2000) Bagging and Boosting with dynamic integration of classifier. In: Proc. of PKDD 2000, Lyon, France. Lecture Notes in artificial intelligence, vol 1910. Springer Verlag, pp 116–125

  32. Ueda N (2000) Optimal linear combination of neural networks for improving classification perform. IEEE Trans. PAMI 22

  33. Verikas A, Lipnickas A, Malmqvist K, Bacauskiene M, Gelzinis A (1999) Soft combination of neural classifiers: A comparative study. Pattern Recognition Letter 20(4):429–444

    Article  Google Scholar 

  34. Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proc. of KDD

  35. Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimation. IEEE Trans. PAMI 19(4)

  36. Xu L, Krzyak A, Suen C (1992) Methods of combining multiple classifiers and their application to handwriting recognition. IEEE Trans. Sys, Man and Cyber 22

  37. Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Prof. of 20th ICML Conf. Washington DC

  38. Zhu X, Wu X (2004) Class noise vs attribute noise: A quantitative study of their impacts. Artificial Intelligence Review 22(3/4): 177–210

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

A preliminary version of this paper was published in the Proceedings of the 4th IEEE International Conference on Data Mining, pp 305–312, Brighton, UK

Xingquan Zhu received his Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent four months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN. He is currently a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT. His research interests include Data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 40 refereed papers in various journals and conference proceedings.

Xindong Wu is a Professor and the Chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, ICML, KDD, ICDM, and WWW, as well as 11 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner.

Ying Yang received her Ph.D. in Computer Science from Monash University, Australia in 2003. Following academic appointments at the University of Vermont, USA, she currently holds a Research Fellow at Monash University, Australia. Dr. Yang is recognized for contributions in the fields of machine learning and data mining. She has published many scientific papers and book chapters on adaptive learning, proactive mining, noise cleansing and discretization. Contact her at yyang@mail.csse.monash.edu.au.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Wu, X. & Yang, Y. Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9, 339–363 (2006). https://doi.org/10.1007/s10115-005-0212-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0212-y

Navigation