Abstract
In this paper, we formulate a new research problem of concept learning and summarization for one-class data streams. The main objectives are to (1) allow users to label instance groups, instead of single instances, as positive samples for learning, and (2) summarize concepts labeled by users over the whole stream. The employment of the batch-labeling raises serious issues for stream-oriented concept learning and summarization, because a labeled instance group may contain non-positive samples and users may change their labeling interests at any time. As a result, so the positive samples labeled by users, over the whole stream, may be inconsistent and contain multiple concepts. To resolve these issues, we propose a one-class learning and summarization (OCLS) framework with two major components. In the first component, we propose a vague one-class learning (VOCL) module for concept learning from data streams using an ensemble of classifiers with instance level and classifier level weighting strategies. In the second component, we propose a one-class concept summarization (OCCS) module that uses clustering techniques and a Markov model to summarize concepts labeled by users, with only one scanning of the stream data. Experimental results on synthetic and real-world data streams demonstrate that the proposed VOCL module outperforms its peers for learning concepts from vaguely labeled stream data. The OCCS module is also able to rebuild a high-level summary for concepts marked by users over the stream.
Similar content being viewed by others
References
Aggarwal C (2007) Data streams: models and algorithms. Springer, New York
Aggarwal C (2009) On classification and segmentation of massive audio data streams. Knowl Inf Syst 20(2): 137–156
Akyildiz I, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38: 393–422
Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and Regression Trees. Chapman & Hall/CRC,
Brodley C, Friedl M (1999) Identifying mislabeled training data. J AI Res (JAIR) 11: 131–167
Chang C-C, Lin C-J LIBSVM : a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen Y, Zhou X, Huang T (2001) One-class SVM for learning in image retrieval. In: Prof. of international conference on image processing
Cho M, Pei J, Wang K (2007) Answering ad hoc aggregate queries from data streams using prefix aggregate trees. Knowl Inf Syst 12(3): 301–329
Dang X, Ng W, Ong K (2008) Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst 16(2): 245–258
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Series B (Methodol) 39(1): 1–38
Dietterich T (2000) Ensemble methods in machine learning. In: Proceedings of the 1st workshop on multiple classifier systems
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of KDD
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of SIGKDD
Fan W, Huang Y, Wang H, Yu P (2004) Active mining of data streams. In: Proceedings of SIAM international conference on data mining
Gao J, Fan W, Han W, Yu P (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of SIAM international conference on data mining
Goh K, Chang E, Li B (2005) Using one-class and two-class SVMs for multiclass image annotation. IEEE Trans Knowl Data Eng 17(10): 1333–1346
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of KDD
Japkowicz N (1999) Concept-learning in the absence of counter-examples: an autoassociation-based approach to classification. Ph.D. Dissertation, Rutgers, the State University of New Jersey
Jiang B, Zhang M, Zhang X (2007) OSCAR: one-class SVM for accurate recognition of cis-elements. Bioinformatics 23(21): 2823–2828
Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of ICML
Li X, Yu P, Liu B, Ng S (2009) Positive unlabeled learning for data stream classification. In: Proceedings of SDM
Liu B, Dai Y, Lee WS, Yu PS, Li X (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of ICDM
Macqueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability
Manevitz L, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2: 139–154
Meyn SP, Tweedie RL (2008) 2005 Markov Chains and Stochastic Stability Second edition. Cambridge University Press,
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning
Nigam K, Mccallum A, Thrun S, Mitchell T (1999) Text classification from labeled and unlabeled documents using EM. machine learning 1–34
Perdisci R, Gu G, Lee W (2006) Using an ensemble of one-class SVM classifiers to Harden payload-based Anomaly detection systems. In: Proceedings of ICDM
Quinlan JR (1993) C4 5: Programs for Machine Learning. Morgan Kaufmann Publishers,
Schölkopf B, Platt J, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13: 1443–1471
Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of KDD
Tax DMJ (2001) One-class classification, concept learning in the absence of counter examples. Ph.D. Thesis, Delft University of Technology, Delft, Netherland
Von Neumann J (1951) Various techniques used in connection with random digits, Nat’l Bureau of standards. Applied Math. Series 12: 36–38
Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of KDD
Wang H et al (2006) Suppressing model overfitting in mining concept-drifting data streams. In: Proceedings of KDD
Wang Q, Lopes L (2005) One-class learning for Human-Robot interaction. In: Emerging solutions for future manufacturing systems, Springer, pp. 489–498
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23: 69–101
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edition, Morgan Kaufmann
Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions of data streams. In: Proceedings of KDD
Yousef M, Jung S, Showe L, Showe M (2008) Learning from positive examples when the negative class is undetermined- microRNA gene identification. Algorithms Mol Biol 3: 2
Yu H, Han J, Chang KC-C (2004) PEBL: web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1): 70–81
Zadrozny B, Langford J, Abe N (2003) Cost-Sensitive Learning by cost-proportionate example weighting. In: Proceedings of ICDM
Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proceedings of the 14th KDD Conference
Zhang P, Zhu X, Guo L (2009) Mining data streams with labeled and unlabeled training examples. In: Proceedings of ICDM
Zhu X (2010) Stream Data Mining Repository, http://www.cse.fau.edu/~xqzhu/stream.html
Zhu X, Wu X (2004) Class noise versus attribute noise: a quantitative study of their impacts. Artif Intell Rev 22: 177–210
Zhu X, Wu X (2006) Scalable representative instance selection and ranking. In: Proceedings of the 18th international conference on pattern recognition (ICPR)
Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceedings of ICML
Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363
Zhu X, Wu X, Zhang C (2009) One-class vague learning for data streams. In: Proceedings of the 9th IEEE international conference on data mining, Miami, FL
Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the 7th IEEE ICDM conference
Zhu X, Zhang P, Wu X, He D, Zhang C, Shi Y (2008) Cleansing noisy data streams. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), Pisa, Italy
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of the paper [50], without concept summarization, was published in the Proceedings of the 9th IEEE International Conference on Data mining, Miami, FL, 2009. This work is supported in part by Australian Research Council Discovery Project under grant DP1093762 and US NSF through grant IIS-0905215.
Rights and permissions
About this article
Cite this article
Zhu, X., Ding, W., Yu, P.S. et al. One-class learning and concept summarization for data streams. Knowl Inf Syst 28, 523–553 (2011). https://doi.org/10.1007/s10115-010-0331-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0331-y