Skip to main content

Advertisement

Log in

A framework for modeling positive class expansion with single snapshot

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many real-world data mining tasks, the connotation of the target concept may change as time goes by. For example, the connotation of “learned knowledge” of a student today may be different from his/her “learned knowledge” tomorrow, since the “learned knowledge” of the student is expanding everyday. In order to learn a model capable of making accurate predictions, the evolution of the concept must be considered, and thus, a series of data sets collected at different time is needed. In many tasks, however, there is only a single data set instead of a series of data sets. In other words, only a single snapshot of the data along the time axis is available. In this paper, we formulate the Positive Class Expansion with single Snapshot (PCES) problem and discuss its difference with existing problem settings. To show that this new problem is addressable, we propose a framework which involves the incorporation of desirable biases based on user preferences. The resulting optimization problem is solved by the Stochastic Gradient Boosting with Double Target approach, which achieves encouraging performance on PCES problems in experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Bickel S, Brüeckner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. In: Proceedings of the 24th international conference on machine learning, Corvallis, OR, pp 81–88

  2. Blake CL, Keogh E, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html

  3. Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159

    Article  Google Scholar 

  4. Breiman L (2001) Random forests. Mach Learn 45(1): 5–32

    Article  MATH  Google Scholar 

  5. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  6. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232

    Article  MATH  Google Scholar 

  7. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4): 367–378

    Article  MATH  Google Scholar 

  8. Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge

    Google Scholar 

  9. Huang J, Smola AJ, Gretton A, Borgwardt KM, Schölkopf B (2007) Correcting sample selection bias by unlabeled data. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 601–608

    Google Scholar 

  10. Kim C-J, Nelson CR (1999) State-space models with regime switching. MIT Press, Cambridge

    Google Scholar 

  11. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the 17th international conference on machine learning, Standord, CA, pp 487–494

  12. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd IEEE international conference on data mining, Melbourne, FL, pp 123–130

  13. Li X, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: Proceedings of the 16th European conference on machine learning, Porto, Portugal, pp 218–229

  14. Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the 19th international conference on machine learning, Sydney, Australia, pp 387–394

  15. Liu FT, Ting KM, Yu Y, Zhou Z-H (2008) Spectrum of variable-random trees. J Artif Intell Res 32: 355–384

    MATH  Google Scholar 

  16. Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2): 227–244

    Article  MATH  MathSciNet  Google Scholar 

  17. Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  18. Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh, PA, pp 985–992

  19. Wang K, Zhang J, Shen F, Shi L (2008) Adaptive learning of dynamic Bayesian networks with changing structures by detecting geometric structures of time series. Knowl Inf Syst 17(1): 121–133

    Article  Google Scholar 

  20. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco CA

    MATH  Google Scholar 

  21. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37

    Article  Google Scholar 

  22. Yu H, Han J, Chang KC-C (2002) PEBL: positive example based learning for web page classification using svm. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Alberta, Canada, pp 239–248

  23. Yu Y, Zhanc D-C, Liu X-Y, Li M, Zhou Z-H (2007) Predicting future customers via ensembling gradually expanded trees. Int J Data Warehous Min 3(2): 12–21

    Google Scholar 

  24. Yu Y, Zhou Z-H (2008) A framework for modeling positive class expansion with single snapshot. In: Proceedings of the 12th Pacific-Asia conference on knowledge discovery and data mining, Osaka, Japan, pp 429–440

  25. Zhou Z-H (2008) Ensemble learning. In: Li SZ (eds) Encyclopedia of biometrics. Springer, Berlin

    Google Scholar 

  26. Zhou Z-H, Yu Y (2009) AdaBoost. In: Wu X, Kumar V (eds) The top ten algorithms in data mining. Chapman & Hall, Boca Raton

    Google Scholar 

  27. Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, Y., Zhou, ZH. A framework for modeling positive class expansion with single snapshot. Knowl Inf Syst 25, 211–227 (2010). https://doi.org/10.1007/s10115-009-0238-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0238-7

Keywords