Skip to main content
Log in

Dynamic classifier ensemble for positive unlabeled text stream classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et al. 2009b), DVS (with necessary adaption) (Tsymbal et al. in Inf Fusion 9(1):56–68, 2008), and Stacking style ensemble-based algorithm (Zhang et al. 2008b).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Calvo B, Larranaga P, Lozano JA (2005) Learning bayesian classifiers from positive and unlabeled examples. Pattern Recognit Lett 28(16): 2375–2384

    Article  Google Scholar 

  2. Cheng R, Kalashnikov D, Prabhakar S (2005) Learning from positive and unlabeled examples. Theor Comput Sci 38(1): 70–83

    Google Scholar 

  3. Didaci L, Giacinto G, Roli F, Marcialis GL (2005) A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognit 38(11): 2188–2191

    Article  MATH  Google Scholar 

  4. Dietterich TG (2002) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp 1–15

  5. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining(KDD’00). Boston, pp 71–80

  6. Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’04), ACM Press, pp 128–137

  7. Fan W, Huang YA, Wang H, Yu PS (2004a) Active mining of data streams. In: Proceedings of the fourth SIAM international conference on data mining(SDM’04), pp 457–461

  8. Fan W, Huang YA, Yu PS (2004b) Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the fourth IEEE international conference on data mining(ICDM’04), pp 379–382

  9. Fung GPC, Yu JX, Lu H, Yu PS (2006) Text classification without negative examples revisit. IEEE Trans Knowl Data Eng 18(1): 6–20

    Article  Google Scholar 

  10. Grossi V, Turini F (2010) Stream mining: a novel architecture for ensemble-based classification. Knowl Inf Syst: 1–35. doi:10.1007/s10115-011-0378-4

  11. Huang S, Dong Y (2007) An active learning system for mining time-changing data streams. Intell Data Anal 11(4): 401–419

    Google Scholar 

  12. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining(KDD’01), pp 97–106

  13. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning(ICML’00), pp 487–494

  14. Koa A, Sabourina R, Britto A Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41(5):1718–1731

    Google Scholar 

  15. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the third international conference on data mining (ICDM’03), pp 123–130

  16. Lewis DD, Yang Y, Rose TG, Dietterich G, Li F, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397

    Google Scholar 

  17. Li C, Zhang Y, Li X (2009a) OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proceedings of the third international workshop on knowledge discovery from sensor data. Paris, pp 79–86

  18. Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: International joint conference on artificial intelligence (IJCAI’03), pp 587–594

  19. Li X, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: Proceedings of European conference on machine learning (ECML’05), pp 218–229

  20. Li XL, Yu PS, Liu B, Ng SK (2009b) Positive unlabeled learning for data stream classification. In: Proceedings of the ninth SIAM international conference on data mining (SDM’09), pp 257–268

  21. Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the nineteenth international conference on machine learning (ICML’02)

  22. Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 179–186

  23. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Google Scholar 

  24. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7): 1443–1471

    Article  MATH  Google Scholar 

  25. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47

    Article  Google Scholar 

  26. Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01), pp 377–382

  27. Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Inf Fusion 9(1): 56–68

    Article  Google Scholar 

  28. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth international conference on knowledge discovery and data mining (KDD’03), pp 226–235

  29. Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. In: European conference on machine learning (ECML’93). Springer, Berlin, pp 227–243

  30. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1): 69–101

    Google Scholar 

  31. Widyantoro D, Yen J (2005) Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans Knowl Data Eng 17(3): 401–412

    Article  Google Scholar 

  32. Woods K, Kegelmeyer WP Jr, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19(4): 405–410

    Article  Google Scholar 

  33. Wozniak M (2010) A hybrid decision tree training method using data streams. Knowl Inf Syst: 1–13. doi:10.1007/s10115-010-0345-5

  34. Wu S, Yang C, Zhou J (2006) Clustering-training for data stream mining. In: Proceedings of the sixth IEEE international conference on data mining workshops (ICDMW’06), pp 653–656

  35. Yu H, Han J, Chang KCC (2004) PEBL: web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81

    Google Scholar 

  36. Zhang B, Zuo W (2008) Learning from positive and unlabeled examples: a survey. In: International symposiums on information processing, IEEE Computer Society, Los Alamitos, pp 650–654

  37. Zhang P, Zhu X, Shi Y (2008a) Categorizing and mining concept drifting data streams. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08). Las Vegas, pp 812–820

  38. Zhang Y, Jin X (2006) An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Rec 35(3): 28–33

    Article  Google Scholar 

  39. Zhang Y, Li X, Orlowska M (2008b) One-class classification of text streams with concept drift. In: Proceedings of the 2008 IEEE international conference on data mining workshops (ICDMW’08), pp 116–125

  40. Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2): 239–263

    Article  MathSciNet  MATH  Google Scholar 

  41. Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363

    Article  MathSciNet  Google Scholar 

  42. Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the seventh international conference on data mining (ICDM’07), pp 757–762

  43. Zhu X, Ding W, Yu P, Zhang C (2010) One-class learning and concept summarization for data streams. Knowl Inf Syst: 1–31. http://dx.doi.org/10.1007/s10115-010-0331-y

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, S., Zhang, Y. & Li, X. Dynamic classifier ensemble for positive unlabeled text stream classification. Knowl Inf Syst 33, 267–287 (2012). https://doi.org/10.1007/s10115-011-0469-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0469-2

Keywords

Navigation