A non-parametric semi-supervised discretization method

Bondu, Alexis; Boullé, Marc; Lemaire, Vincent

doi:10.1007/s10115-009-0230-2

A non-parametric semi-supervised discretization method

Regular Paper
Published: 01 August 2009

Volume 24, pages 35–57, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Alexis Bondu¹,
Marc Boullé² &
Vincent Lemaire²

196 Accesses
Explore all metrics

Abstract

Semi-supervised classification methods aim to exploit labeled and unlabeled examples to train a predictive model. Most of these approaches make assumptions on the distribution of classes. This article first proposes a new semi-supervised discretization method, which adopts very low informative prior on data. This method discretizes the numerical domain of a continuous input variable, while keeping the information relative to the prediction of classes. Then, an in-depth comparison of this semi-supervised method with the original supervised MODL approach is presented. We demonstrate that the semi-supervised approach is asymptotically equivalent to the supervised approach, improved with a post-optimization of the intervals bounds location.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Berger J (2006) The case of objective Bayesian analysis. Bayesian Anal 1(3): 385–402
MathSciNet Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT ’98: Proceedings of the eleventh annual conference on Computational learning theory. ACM Press, New York, pp 92–100
Boullé M (2005) A Bayes optimal approach for partitioning the values of categorical attributes. J Mach Learn Res 6: 1431–1452
MathSciNet Google Scholar
Boullé M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1): 131–165
Article Google Scholar
Catlett J (1991) On changing continuous attributes into ordered discrete attributes. In: EWSL-91: Proceedings of the European working session on learning on machine learning. Springer, New York, pp 164–178
Chapelle O, Schölkopf B, Zien A (2007) Semi-supervised learning. MIT Press, Cambridge
Google Scholar
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: International conference on machine learning, pp 194–202
Fawcett T (2003) Roc graphs: notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs. http://citeseer.ist.psu.edu/fawcett03roc.html
Fayyad U, Irani K (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8: 87–102
MATH Google Scholar
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. Adv Knowl Discov Data Min 1–34
Fujino A, Ueda N, Saito K (2007) A hybrid generative/discriminative approach to text classification with additional information. Inf Process Manage 43: 379–392
Article Google Scholar
Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11: 63–91
Article MATH Google Scholar
Jin R, Breitbart Y, Muoh C. (2009) Data discretization unification. Knowl Inf Syst 19: 1–29
Article Google Scholar
Kohavi R, Sahami M (1996) Error-based and entropy-based discretization of continuous features. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 114–119
Langley P, Iba W, Thomas K (1992) An analysis of Bayesian classifiers. In: Press A (ed) Tenth national conference on artificial intelligence, pp 223–228
Liu H, Hussain F, Tan C, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4): 393–423
Article MathSciNet Google Scholar
Maeireizo B, Litman D, Hwa R (2004) Analyzing the effectiveness and applicability of co-training. In: ACL ’04: the companion proceedings of the 42nd annual meeting of the association for computational linguistics
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Pyle D (1999) Data preparation for data mining. Morgan Kaufmann , San Francisco, p 19
Google Scholar
Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471
Article MATH Google Scholar
Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: Seventh IEEE workshop on applications of computer vision
Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison
Shannon C (1948) A mathematical theory of communication. Key papers in the development of information theory
Sugiyama M, Krauledat M, Müller K (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8: 985–1005
Google Scholar
Sugiyama M, Müller K (2005) Model selection under covariate shift. In: ICANN, International conference on computational on artificial neural networks: formal models and their applications
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PY, Zhou Z, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1)
Zhou ZH, Li M (2009) Semi-supervised learning by disagreement. Knowl Inf Syst doi:10.1007/s10115-009-0209-z
Zighed D, Rakotomalala R (2000) Graphes d’induction. Hermes, France

Download references

Author information

Authors and Affiliations

EDF R&D (ICAME/SOAD), 1 av. Général de Gaulle, 92140, Clamart, France
Alexis Bondu
ORANGE LABS (TECH/EASY/TSI), 2 av. Pierre Marzin, 22300, Lannion, France
Marc Boullé & Vincent Lemaire

Authors

Alexis Bondu
View author publications
You can also search for this author inPubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author inPubMed Google Scholar
Vincent Lemaire
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Alexis Bondu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bondu, A., Boullé, M. & Lemaire, V. A non-parametric semi-supervised discretization method. Knowl Inf Syst 24, 35–57 (2010). https://doi.org/10.1007/s10115-009-0230-2

Download citation

Received: 18 March 2009
Revised: 03 June 2009
Accepted: 20 June 2009
Published: 01 August 2009
Issue Date: July 2010
DOI: https://doi.org/10.1007/s10115-009-0230-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A non-parametric semi-supervised discretization method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On semi-supervised learning

Why Semi-supervised Learning Makes Sense: A Pedagogical Note

Fractionally-Supervised Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A non-parametric semi-supervised discretization method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On semi-supervised learning

Why Semi-supervised Learning Makes Sense: A Pedagogical Note

Fractionally-Supervised Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now