Skip to main content
Log in

One-class learning and concept summarization for data streams

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we formulate a new research problem of concept learning and summarization for one-class data streams. The main objectives are to (1) allow users to label instance groups, instead of single instances, as positive samples for learning, and (2) summarize concepts labeled by users over the whole stream. The employment of the batch-labeling raises serious issues for stream-oriented concept learning and summarization, because a labeled instance group may contain non-positive samples and users may change their labeling interests at any time. As a result, so the positive samples labeled by users, over the whole stream, may be inconsistent and contain multiple concepts. To resolve these issues, we propose a one-class learning and summarization (OCLS) framework with two major components. In the first component, we propose a vague one-class learning (VOCL) module for concept learning from data streams using an ensemble of classifiers with instance level and classifier level weighting strategies. In the second component, we propose a one-class concept summarization (OCCS) module that uses clustering techniques and a Markov model to summarize concepts labeled by users, with only one scanning of the stream data. Experimental results on synthetic and real-world data streams demonstrate that the proposed VOCL module outperforms its peers for learning concepts from vaguely labeled stream data. The OCCS module is also able to rebuild a high-level summary for concepts marked by users over the stream.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal C (2007) Data streams: models and algorithms. Springer, New York

    MATH  Google Scholar 

  2. Aggarwal C (2009) On classification and segmentation of massive audio data streams. Knowl Inf Syst 20(2): 137–156

    Article  MathSciNet  Google Scholar 

  3. Akyildiz I, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38: 393–422

    Article  Google Scholar 

  4. Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and Regression Trees. Chapman & Hall/CRC,

  5. Brodley C, Friedl M (1999) Identifying mislabeled training data. J AI Res (JAIR) 11: 131–167

    MATH  Google Scholar 

  6. Chang C-C, Lin C-J LIBSVM : a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Chen Y, Zhou X, Huang T (2001) One-class SVM for learning in image retrieval. In: Prof. of international conference on image processing

  8. Cho M, Pei J, Wang K (2007) Answering ad hoc aggregate queries from data streams using prefix aggregate trees. Knowl Inf Syst 12(3): 301–329

    Article  Google Scholar 

  9. Dang X, Ng W, Ong K (2008) Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst 16(2): 245–258

    Article  Google Scholar 

  10. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Series B (Methodol) 39(1): 1–38

    MathSciNet  MATH  Google Scholar 

  11. Dietterich T (2000) Ensemble methods in machine learning. In: Proceedings of the 1st workshop on multiple classifier systems

  12. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of KDD

  13. Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of SIGKDD

  14. Fan W, Huang Y, Wang H, Yu P (2004) Active mining of data streams. In: Proceedings of SIAM international conference on data mining

  15. Gao J, Fan W, Han W, Yu P (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of SIAM international conference on data mining

  16. Goh K, Chang E, Li B (2005) Using one-class and two-class SVMs for multiclass image annotation. IEEE Trans Knowl Data Eng 17(10): 1333–1346

    Article  Google Scholar 

  17. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of KDD

  18. Japkowicz N (1999) Concept-learning in the absence of counter-examples: an autoassociation-based approach to classification. Ph.D. Dissertation, Rutgers, the State University of New Jersey

  19. Jiang B, Zhang M, Zhang X (2007) OSCAR: one-class SVM for accurate recognition of cis-elements. Bioinformatics 23(21): 2823–2828

    Article  Google Scholar 

  20. Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of ICML

  21. Li X, Yu P, Liu B, Ng S (2009) Positive unlabeled learning for data stream classification. In: Proceedings of SDM

  22. Liu B, Dai Y, Lee WS, Yu PS, Li X (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of ICDM

  23. Macqueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability

  24. Manevitz L, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2: 139–154

    Article  MATH  Google Scholar 

  25. Meyn SP, Tweedie RL (2008) 2005 Markov Chains and Stochastic Stability Second edition. Cambridge University Press,

  26. Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning

  27. Nigam K, Mccallum A, Thrun S, Mitchell T (1999) Text classification from labeled and unlabeled documents using EM. machine learning 1–34

  28. Perdisci R, Gu G, Lee W (2006) Using an ensemble of one-class SVM classifiers to Harden payload-based Anomaly detection systems. In: Proceedings of ICDM

  29. Quinlan JR (1993) C4 5: Programs for Machine Learning. Morgan Kaufmann Publishers,

  30. Schölkopf B, Platt J, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13: 1443–1471

    Article  MATH  Google Scholar 

  31. Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of KDD

  32. Tax DMJ (2001) One-class classification, concept learning in the absence of counter examples. Ph.D. Thesis, Delft University of Technology, Delft, Netherland

  33. Von Neumann J (1951) Various techniques used in connection with random digits, Nat’l Bureau of standards. Applied Math. Series 12: 36–38

    MathSciNet  Google Scholar 

  34. Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of KDD

  35. Wang H et al (2006) Suppressing model overfitting in mining concept-drifting data streams. In: Proceedings of KDD

  36. Wang Q, Lopes L (2005) One-class learning for Human-Robot interaction. In: Emerging solutions for future manufacturing systems, Springer, pp. 489–498

  37. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23: 69–101

    Google Scholar 

  38. Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edition, Morgan Kaufmann

  39. Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions of data streams. In: Proceedings of KDD

  40. Yousef M, Jung S, Showe L, Showe M (2008) Learning from positive examples when the negative class is undetermined- microRNA gene identification. Algorithms Mol Biol 3: 2

    Article  Google Scholar 

  41. Yu H, Han J, Chang KC-C (2004) PEBL: web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1): 70–81

    Article  Google Scholar 

  42. Zadrozny B, Langford J, Abe N (2003) Cost-Sensitive Learning by cost-proportionate example weighting. In: Proceedings of ICDM

  43. Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proceedings of the 14th KDD Conference

  44. Zhang P, Zhu X, Guo L (2009) Mining data streams with labeled and unlabeled training examples. In: Proceedings of ICDM

  45. Zhu X (2010) Stream Data Mining Repository, http://www.cse.fau.edu/~xqzhu/stream.html

  46. Zhu X, Wu X (2004) Class noise versus attribute noise: a quantitative study of their impacts. Artif Intell Rev 22: 177–210

    Article  MATH  Google Scholar 

  47. Zhu X, Wu X (2006) Scalable representative instance selection and ranking. In: Proceedings of the 18th international conference on pattern recognition (ICPR)

  48. Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceedings of ICML

  49. Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363

    Article  MathSciNet  Google Scholar 

  50. Zhu X, Wu X, Zhang C (2009) One-class vague learning for data streams. In: Proceedings of the 9th IEEE international conference on data mining, Miami, FL

  51. Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the 7th IEEE ICDM conference

  52. Zhu X, Zhang P, Wu X, He D, Zhang C, Shi Y (2008) Cleansing noisy data streams. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), Pisa, Italy

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingquan Zhu.

Additional information

A preliminary version of the paper [50], without concept summarization, was published in the Proceedings of the 9th IEEE International Conference on Data mining, Miami, FL, 2009. This work is supported in part by Australian Research Council Discovery Project under grant DP1093762 and US NSF through grant IIS-0905215.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Ding, W., Yu, P.S. et al. One-class learning and concept summarization for data streams. Knowl Inf Syst 28, 523–553 (2011). https://doi.org/10.1007/s10115-010-0331-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0331-y

Keywords

Navigation