A framework for modeling positive class expansion with single snapshot

Yu, Yang; Zhou, Zhi-Hua

doi:10.1007/s10115-009-0238-7

A framework for modeling positive class expansion with single snapshot

Regular Paper
Published: 01 August 2009

Volume 25, pages 211–227, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yang Yu¹ &
Zhi-Hua Zhou¹

278 Accesses
Explore all metrics

Abstract

In many real-world data mining tasks, the connotation of the target concept may change as time goes by. For example, the connotation of “learned knowledge” of a student today may be different from his/her “learned knowledge” tomorrow, since the “learned knowledge” of the student is expanding everyday. In order to learn a model capable of making accurate predictions, the evolution of the concept must be considered, and thus, a series of data sets collected at different time is needed. In many tasks, however, there is only a single data set instead of a series of data sets. In other words, only a single snapshot of the data along the time axis is available. In this paper, we formulate the Positive Class Expansion with single Snapshot (PCES) problem and discuss its difference with existing problem settings. To show that this new problem is addressable, we propose a framework which involves the incorporation of desirable biases based on user preferences. The resulting optimization problem is solved by the Stochastic Gradient Boosting with Double Target approach, which achieves encouraging performance on PCES problems in experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bickel S, Brüeckner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. In: Proceedings of the 24th international conference on machine learning, Corvallis, OR, pp 81–88
Blake CL, Keogh E, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1): 5–32
Article MATH Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232
Article MATH Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4): 367–378
Article MATH Google Scholar
Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge
Google Scholar
Huang J, Smola AJ, Gretton A, Borgwardt KM, Schölkopf B (2007) Correcting sample selection bias by unlabeled data. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 601–608
Google Scholar
Kim C-J, Nelson CR (1999) State-space models with regime switching. MIT Press, Cambridge
Google Scholar
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the 17th international conference on machine learning, Standord, CA, pp 487–494
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd IEEE international conference on data mining, Melbourne, FL, pp 123–130
Li X, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: Proceedings of the 16th European conference on machine learning, Porto, Portugal, pp 218–229
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the 19th international conference on machine learning, Sydney, Australia, pp 387–394
Liu FT, Ting KM, Yu Y, Zhou Z-H (2008) Spectrum of variable-random trees. J Artif Intell Res 32: 355–384
MATH Google Scholar
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2): 227–244
Article MATH MathSciNet Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh, PA, pp 985–992
Wang K, Zhang J, Shen F, Shi L (2008) Adaptive learning of dynamic Bayesian networks with changing structures by detecting geometric structures of time series. Knowl Inf Syst 17(1): 121–133
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco CA
MATH Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Article Google Scholar
Yu H, Han J, Chang KC-C (2002) PEBL: positive example based learning for web page classification using svm. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Alberta, Canada, pp 239–248
Yu Y, Zhanc D-C, Liu X-Y, Li M, Zhou Z-H (2007) Predicting future customers via ensembling gradually expanded trees. Int J Data Warehous Min 3(2): 12–21
Google Scholar
Yu Y, Zhou Z-H (2008) A framework for modeling positive class expansion with single snapshot. In: Proceedings of the 12th Pacific-Asia conference on knowledge discovery and data mining, Osaka, Japan, pp 429–440
Zhou Z-H (2008) Ensemble learning. In: Li SZ (eds) Encyclopedia of biometrics. Springer, Berlin
Google Scholar
Zhou Z-H, Yu Y (2009) AdaBoost. In: Wu X, Kumar V (eds) The top ten algorithms in data mining. Chapman & Hall, Boca Raton
Google Scholar
Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, 210093, Nanjing, China
Yang Yu & Zhi-Hua Zhou

Authors

Yang Yu
View author publications
You can also search for this author inPubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, Y., Zhou, ZH. A framework for modeling positive class expansion with single snapshot. Knowl Inf Syst 25, 211–227 (2010). https://doi.org/10.1007/s10115-009-0238-7

Download citation

Received: 01 October 2008
Revised: 05 April 2009
Accepted: 20 June 2009
Published: 01 August 2009
Issue Date: November 2010
DOI: https://doi.org/10.1007/s10115-009-0238-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for modeling positive class expansion with single snapshot

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Balanced SAM-kNN: Online Learning with Heterogeneous Drift and Imbalanced Data

Incremental one-class classifier based on convex–concave hull

Learning Evolving Concepts with Online Class Posterior Probability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A framework for modeling positive class expansion with single snapshot

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Balanced SAM-kNN: Online Learning with Heterogeneous Drift and Imbalanced Data

Incremental one-class classifier based on convex–concave hull

Learning Evolving Concepts with Online Class Posterior Probability

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now