Skip to main content
Log in

Semi-supervised learning with an imperfect supervisor

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Real-life applications may involve huge data sets with misclassified or partially classified training data. Semi-supervised learning and learning in the presence of label noise have recently emerged as new paradigms in the machine learning community to cope with this kind of problems. This paper describes a new discriminant algorithm for semi-supervised learning. This algorithm optimizes the classification maximum likelihood (CML) of a set of labeled–unlabeled data, using a discriminant extension of the Classification Expectation Maximization algorithm. We further propose to extend this algorithm by modeling imperfections in the estimated class labels for unlabeled data. The parameters of this label-error model are learned together with the semi-supervised classifier parameters. We demonstrate the effectiveness of the approach using extensive experiments on different datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aitchison J, Begg CB (1976) Statistical diagnosis when basic cases are not classified with certainty. Biometrika 63(1):1–12

    Google Scholar 

  2. Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London

    Google Scholar 

  3. Ambroise C, Govaert G (2000) em for partially known labels. In: Proceedings of the 7th international federation of classification societies Namur, Belgium, pp 161–166

  4. Amini MR, Gallinari P (2003) Semi-supervised learning with explicit misclassification modeling. In: Proceedings of the 18th international joint conference on artificial intelligence Acapulco, Mexico, pp 555–560

  5. Amini MR, Gallinari P (2002) The use of unlabeled data to improve supervised learning for text summarization. In: Proceedings of SIGIR-02, 25th ACM international conference on research and development in information retrieval Tampere, Finland, pp 105–112

  6. Anderson JA (1979) Multivariate logistic compounds. Biometrika 66(1):17–26

    Google Scholar 

  7. Anderson JA (1982) Logistic discrimination. In: Krishnaiah PR, Kanal L (eds) Handbook of statistics, vol. 2, pp 169–191

  8. Banko M, Mittal V, Kantrowitz M, Goldstein J (1999) Generating extraction-based summaries from hand-written done by text alignment. In: Proceedings of the pacific association for computational linguistics

  9. Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: Proceedings of the 19th international conference on machine learning Sydney, Australia, pp 19–26

  10. Belkin M, Niyogi P (2004) Semi-supervised learning on Riemannian manifiolds. Machine Learn 56:209–223

    Article  Google Scholar 

  11. Bennett KP, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural information processing systems, Denver, USA, vol. 11. MIT Press, Cambridge, pp 368–374

    Google Scholar 

  12. Blake CL, Merz CJ (1998) {UCI} Repository of machine learning databases, url: http://www.ics.uci.edu/~mlearn/MLRepository.html. University of California, Irvine, Department of Information and Computer Sciences

  13. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the workshop on computational learning theory. Madison, Wisconsin, USA, pp 92–100

  14. Castelli V, Cover TM (1996) The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans Inform Theory 42(6):2102–2117

    Article  Google Scholar 

  15. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332

    Article  Google Scholar 

  16. Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. AI Stat

  17. Chhikara RS, McKeon J (1992) Linear discriminant analysis with misallocation in training samples. J Am Stat Assoc 79:899–906

    Google Scholar 

  18. Chittineni CB (1982) Learning with imperfectly labeled patterns. Pattern Recog 12(5):169–191

    Google Scholar 

  19. Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, Maryland, USA, pp 189–196

  20. Cozman FG, Cohen I, Cirelo MC (2003) Semi-supervised learning of mixture models. In: Proceedings of the 20th international conference on machine learning. Washington DC, USA, pp 99–106

  21. DeSa VR (1994) Learning classification with unlabeled data. In: Advances in neural information processing systems, Denver, USA, vol. 6. MIT Press, Cambridge, pp 112–119

    Google Scholar 

  22. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc 39(1):1–38

    Google Scholar 

  23. Jaakkola T, Meila M, Jebara T (2000) Maximum entropy discrimination. In: Advances in neural information processing systems, Denver, USA, vol. 12. MIT Press, Cambridge, pp 470–477

    Google Scholar 

  24. Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Bled, Slovenia, pp 200–209

  25. Krishnan T, Nandy SC (1987) Discriminant analysis with a stochastic suprevisor. Pattern Recog 20(4):379–384

    Article  Google Scholar 

  26. Krishnan T (1988) Efficiency of learning with imperfect supervision. Pattern Recog 21:183–188

    Article  Google Scholar 

  27. Kupiec J, Pederson J, Chen FA (1995) Trainable document summarizer. In: Proceedings of SIGIR-95, 18th ACM international conference on research and development in information retrieval. Seattle, Washington, USA, pp 68–73

  28. Lawrence ND, Schölkopf B (2001) Estimating a kernel Fisher discriminant in the presence of label noise. In: Proceedings of the 18th international conference on machine learning, Massachusetts, USA, pp 306–313

  29. Lachenbrunch PA (1974) Discriminant functions when the initial samples are misclassified. II. Nonrandom misclassification models. Technometrics 16:419–424

    Google Scholar 

  30. McLachlan GJ (1972) Asymptotic results for discriminant analysis when the initial samples are misclassified. Technometrics 14:415–422

    Google Scholar 

  31. McLachlan GJ, Ganesalingam S (1982) Updating a discriminant function in basis of unclassified data. Comm Stat Simulat Comput 11(6):753–767

    Google Scholar 

  32. McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York

    Google Scholar 

  33. Miller D, Uyar H (1997) A mixture of experts classifier with learning based on both labeled and unlabeled data. In: Advances in neural information processing systems, Denver, USA, vol. 9. MIT Press, Cambridge, pp 571–577

    Google Scholar 

  34. Mladenic D, Grobelnik M (1998) Feature selection for classification based on text hierarchy. Working notes of learning from text and the web, conference automated learning and discovery, Carnegie Mellon University, Pittsburgh

  35. Murray GD, Titterington DM (1978) Estimation problems with data from a mixture. Appl Stat 27(3):325–334

    Google Scholar 

  36. Muslea I, Minton S, Knoblock C (2002) Active + semi-supervised learning = robust multi-view learning. In: Proceedings of the 19th international conference on machine learning, Syndey, Australia, pp 435–442

  37. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2/3):127–163

    Article  Google Scholar 

  38. O'Neill TJ (1978) Normal discrimination with unclassified observations. J Am Stat Assoc 73(364):821–826

    Google Scholar 

  39. Ratsaby J, Venkatesh SS (1995) Learning from a mixture of labeled and unlabeled examples with parametric side information. In: Proceedings of the 8th annual conference on computational learning theory, Santa Cruz, California, USA, pp 412–417

  40. Roth V, Steinhage V (1999) Nonlinear discriminant analysis using kernel functions. In: Advances in neural information processing systems, Denver, USA, vol. 12. MIT Press, Cambridge, pp 568–574

    Google Scholar 

  41. Seeger M (2000) Learning with labeled and unlabeled data. University of Edinburgh, Technical report. www.dai.ed.ac.uk/homes/seeger/papers/review.pdf

  42. SUMMAC (1998) TIPSTER Text Summarization Evaluation Conference. http://ww-nlpir.nist.gov/related_projects/tipster_summac/

  43. Szummer M, Jaakkola T (2001) Kernel expansions with unlabeled examples. Advances in Neural Information Processing Systems 13, Vancouver, British Columbia, Canada, pp 626–632

  44. Symons MJ (1981) Clustering criteria and multivariate normal mixture. Biometrics 37(1):35–43

    Google Scholar 

  45. Tibshirani R (1996) A comparison of some error estimates for neural networks models. Neural Comput 8:182–163

    Google Scholar 

  46. Titterington DM (1989) An alternative stochastic supervisor in discriminant analysis. Pattern Recog 22(1):91–95

    Article  Google Scholar 

  47. Vittaut JN, Amini MR, Gallinari P (2002) Learning classification with both labeled and unlabeled data. In: Proceedings of the 13th european conference on machine learning, Helsinki, Finland, pp 468–479

  48. Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning, Washington DC, USA, pp 912–919

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massih R. Amini.

Additional information

Massih R. Amini is currently assistant professor in the University of Pierre and Marie Curie (Paris 6). He received an engineering degree in computer science from the Ecole Supérieure d'Informatique (Computer science engineering school) in Paris in 1995. He then accomplished his master thesis in science in artificial intelligence in 1997 and obtained his PhD in 2001 at University of Pierre and Marie Curie. His research interests include Statistical Learning and Text-Mining.

Patrick Gallinari is currently professor in the University of Pierre and Marie Curie (Paris 6) and head of the Computer Science laboratory (LIP6). His main research activity has been in the field of statistical machine learning for the last 15 years. He has also contributed in developing machine learning techniques for different application domains like information retrieval and text mining, user modelling, man–machine interaction and pen interfaces.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amini, M.R., Gallinari, P. Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8, 385–413 (2005). https://doi.org/10.1007/s10115-005-0219-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0219-4

Keywords

Navigation