Abstract
In many real-world data mining tasks, the connotation of the target concept may change as time goes by. For example, the connotation of “learned knowledge” of a student today may be different from his/her “learned knowledge” tomorrow, since the “learned knowledge” of the student is expanding everyday. In order to learn a model capable of making accurate predictions, the evolution of the concept must be considered, and thus, a series of data sets collected at different time is needed. In many tasks, however, there is only a single data set instead of a series of data sets. In other words, only a single snapshot of the data along the time axis is available. In this paper, we formulate the Positive Class Expansion with single Snapshot (PCES) problem and discuss its difference with existing problem settings. To show that this new problem is addressable, we propose a framework which involves the incorporation of desirable biases based on user preferences. The resulting optimization problem is solved by the Stochastic Gradient Boosting with Double Target approach, which achieves encouraging performance on PCES problems in experiments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Bickel S, Brüeckner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. In: Proceedings of the 24th international conference on machine learning, Corvallis, OR, pp 81–88
Blake CL, Keogh E, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159
Breiman L (2001) Random forests. Mach Learn 45(1): 5–32
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4): 367–378
Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge
Huang J, Smola AJ, Gretton A, Borgwardt KM, Schölkopf B (2007) Correcting sample selection bias by unlabeled data. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 601–608
Kim C-J, Nelson CR (1999) State-space models with regime switching. MIT Press, Cambridge
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the 17th international conference on machine learning, Standord, CA, pp 487–494
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd IEEE international conference on data mining, Melbourne, FL, pp 123–130
Li X, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: Proceedings of the 16th European conference on machine learning, Porto, Portugal, pp 218–229
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the 19th international conference on machine learning, Sydney, Australia, pp 387–394
Liu FT, Ting KM, Yu Y, Zhou Z-H (2008) Spectrum of variable-random trees. J Artif Intell Res 32: 355–384
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2): 227–244
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh, PA, pp 985–992
Wang K, Zhang J, Shen F, Shi L (2008) Adaptive learning of dynamic Bayesian networks with changing structures by detecting geometric structures of time series. Knowl Inf Syst 17(1): 121–133
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco CA
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Yu H, Han J, Chang KC-C (2002) PEBL: positive example based learning for web page classification using svm. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Alberta, Canada, pp 239–248
Yu Y, Zhanc D-C, Liu X-Y, Li M, Zhou Z-H (2007) Predicting future customers via ensembling gradually expanded trees. Int J Data Warehous Min 3(2): 12–21
Yu Y, Zhou Z-H (2008) A framework for modeling positive class expansion with single snapshot. In: Proceedings of the 12th Pacific-Asia conference on knowledge discovery and data mining, Osaka, Japan, pp 429–440
Zhou Z-H (2008) Ensemble learning. In: Li SZ (eds) Encyclopedia of biometrics. Springer, Berlin
Zhou Z-H, Yu Y (2009) AdaBoost. In: Wu X, Kumar V (eds) The top ten algorithms in data mining. Chapman & Hall, Boca Raton
Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, Y., Zhou, ZH. A framework for modeling positive class expansion with single snapshot. Knowl Inf Syst 25, 211–227 (2010). https://doi.org/10.1007/s10115-009-0238-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0238-7