Dynamic classifier ensemble for positive unlabeled text stream classification

Pan, Shirui; Zhang, Yang; Li, Xue

doi:10.1007/s10115-011-0469-2

Dynamic classifier ensemble for positive unlabeled text stream classification

Regular Paper
Published: 23 December 2011

Volume 33, pages 267–287, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shirui Pan¹,
Yang Zhang¹ &
Xue Li²

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et al. 2009b), DVS (with necessary adaption) (Tsymbal et al. in Inf Fusion 9(1):56–68, 2008), and Stacking style ensemble-based algorithm (Zhang et al. 2008b).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Calvo B, Larranaga P, Lozano JA (2005) Learning bayesian classifiers from positive and unlabeled examples. Pattern Recognit Lett 28(16): 2375–2384
Article Google Scholar
Cheng R, Kalashnikov D, Prabhakar S (2005) Learning from positive and unlabeled examples. Theor Comput Sci 38(1): 70–83
Google Scholar
Didaci L, Giacinto G, Roli F, Marcialis GL (2005) A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognit 38(11): 2188–2191
Article MATH Google Scholar
Dietterich TG (2002) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp 1–15
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining(KDD’00). Boston, pp 71–80
Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’04), ACM Press, pp 128–137
Fan W, Huang YA, Wang H, Yu PS (2004a) Active mining of data streams. In: Proceedings of the fourth SIAM international conference on data mining(SDM’04), pp 457–461
Fan W, Huang YA, Yu PS (2004b) Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the fourth IEEE international conference on data mining(ICDM’04), pp 379–382
Fung GPC, Yu JX, Lu H, Yu PS (2006) Text classification without negative examples revisit. IEEE Trans Knowl Data Eng 18(1): 6–20
Article Google Scholar
Grossi V, Turini F (2010) Stream mining: a novel architecture for ensemble-based classification. Knowl Inf Syst: 1–35. doi:10.1007/s10115-011-0378-4
Huang S, Dong Y (2007) An active learning system for mining time-changing data streams. Intell Data Anal 11(4): 401–419
Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining(KDD’01), pp 97–106
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning(ICML’00), pp 487–494
Koa A, Sabourina R, Britto A Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41(5):1718–1731
Google Scholar
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the third international conference on data mining (ICDM’03), pp 123–130
Lewis DD, Yang Y, Rose TG, Dietterich G, Li F, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397
Google Scholar
Li C, Zhang Y, Li X (2009a) OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proceedings of the third international workshop on knowledge discovery from sensor data. Paris, pp 79–86
Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: International joint conference on artificial intelligence (IJCAI’03), pp 587–594
Li X, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: Proceedings of European conference on machine learning (ECML’05), pp 218–229
Li XL, Yu PS, Liu B, Ng SK (2009b) Positive unlabeled learning for data stream classification. In: Proceedings of the ninth SIAM international conference on data mining (SDM’09), pp 257–268
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the nineteenth international conference on machine learning (ICML’02)
Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 179–186
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Google Scholar
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7): 1443–1471
Article MATH Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47
Article Google Scholar
Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01), pp 377–382
Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Inf Fusion 9(1): 56–68
Article Google Scholar
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth international conference on knowledge discovery and data mining (KDD’03), pp 226–235
Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. In: European conference on machine learning (ECML’93). Springer, Berlin, pp 227–243
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1): 69–101
Google Scholar
Widyantoro D, Yen J (2005) Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans Knowl Data Eng 17(3): 401–412
Article Google Scholar
Woods K, Kegelmeyer WP Jr, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19(4): 405–410
Article Google Scholar
Wozniak M (2010) A hybrid decision tree training method using data streams. Knowl Inf Syst: 1–13. doi:10.1007/s10115-010-0345-5
Wu S, Yang C, Zhou J (2006) Clustering-training for data stream mining. In: Proceedings of the sixth IEEE international conference on data mining workshops (ICDMW’06), pp 653–656
Yu H, Han J, Chang KCC (2004) PEBL: web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81
Google Scholar
Zhang B, Zuo W (2008) Learning from positive and unlabeled examples: a survey. In: International symposiums on information processing, IEEE Computer Society, Los Alamitos, pp 650–654
Zhang P, Zhu X, Shi Y (2008a) Categorizing and mining concept drifting data streams. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08). Las Vegas, pp 812–820
Zhang Y, Jin X (2006) An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Rec 35(3): 28–33
Article Google Scholar
Zhang Y, Li X, Orlowska M (2008b) One-class classification of text streams with concept drift. In: Proceedings of the 2008 IEEE international conference on data mining workshops (ICDMW’08), pp 116–125
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2): 239–263
Article MathSciNet MATH Google Scholar
Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363
Article MathSciNet Google Scholar
Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the seventh international conference on data mining (ICDM’07), pp 757–762
Zhu X, Ding W, Yu P, Zhang C (2010) One-class learning and concept summarization for data streams. Knowl Inf Syst: 1–31. http://dx.doi.org/10.1007/s10115-010-0331-y

Download references

Author information

Authors and Affiliations

College of Information Engineering, Northwest A&F University, Yangling, China
Shirui Pan & Yang Zhang
School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Australia
Xue Li

Authors

Shirui Pan
View author publications
You can also search for this author inPubMed Google Scholar
Yang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xue Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yang Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, S., Zhang, Y. & Li, X. Dynamic classifier ensemble for positive unlabeled text stream classification. Knowl Inf Syst 33, 267–287 (2012). https://doi.org/10.1007/s10115-011-0469-2

Download citation

Received: 12 August 2010
Revised: 02 October 2011
Accepted: 09 December 2011
Published: 23 December 2011
Issue Date: November 2012
DOI: https://doi.org/10.1007/s10115-011-0469-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic classifier ensemble for positive unlabeled text stream classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Social Stream Classification with Emerging New Labels

Weighted Ensemble Classification of Multi-label Data Streams

Semi Supervised Adaptive Framework for Classifying Evolving Data Stream

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Dynamic classifier ensemble for positive unlabeled text stream classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Social Stream Classification with Emerging New Labels

Weighted Ensemble Classification of Multi-label Data Streams

Semi Supervised Adaptive Framework for Classifying Evolving Data Stream

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now