Abstract
Real-life applications may involve huge data sets with misclassified or partially classified training data. Semi-supervised learning and learning in the presence of label noise have recently emerged as new paradigms in the machine learning community to cope with this kind of problems. This paper describes a new discriminant algorithm for semi-supervised learning. This algorithm optimizes the classification maximum likelihood (CML) of a set of labeled–unlabeled data, using a discriminant extension of the Classification Expectation Maximization algorithm. We further propose to extend this algorithm by modeling imperfections in the estimated class labels for unlabeled data. The parameters of this label-error model are learned together with the semi-supervised classifier parameters. We demonstrate the effectiveness of the approach using extensive experiments on different datasets.
Similar content being viewed by others
References
Aitchison J, Begg CB (1976) Statistical diagnosis when basic cases are not classified with certainty. Biometrika 63(1):1–12
Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London
Ambroise C, Govaert G (2000) em for partially known labels. In: Proceedings of the 7th international federation of classification societies Namur, Belgium, pp 161–166
Amini MR, Gallinari P (2003) Semi-supervised learning with explicit misclassification modeling. In: Proceedings of the 18th international joint conference on artificial intelligence Acapulco, Mexico, pp 555–560
Amini MR, Gallinari P (2002) The use of unlabeled data to improve supervised learning for text summarization. In: Proceedings of SIGIR-02, 25th ACM international conference on research and development in information retrieval Tampere, Finland, pp 105–112
Anderson JA (1979) Multivariate logistic compounds. Biometrika 66(1):17–26
Anderson JA (1982) Logistic discrimination. In: Krishnaiah PR, Kanal L (eds) Handbook of statistics, vol. 2, pp 169–191
Banko M, Mittal V, Kantrowitz M, Goldstein J (1999) Generating extraction-based summaries from hand-written done by text alignment. In: Proceedings of the pacific association for computational linguistics
Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: Proceedings of the 19th international conference on machine learning Sydney, Australia, pp 19–26
Belkin M, Niyogi P (2004) Semi-supervised learning on Riemannian manifiolds. Machine Learn 56:209–223
Bennett KP, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural information processing systems, Denver, USA, vol. 11. MIT Press, Cambridge, pp 368–374
Blake CL, Merz CJ (1998) {UCI} Repository of machine learning databases, url: http://www.ics.uci.edu/~mlearn/MLRepository.html. University of California, Irvine, Department of Information and Computer Sciences
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the workshop on computational learning theory. Madison, Wisconsin, USA, pp 92–100
Castelli V, Cover TM (1996) The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans Inform Theory 42(6):2102–2117
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332
Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. AI Stat
Chhikara RS, McKeon J (1992) Linear discriminant analysis with misallocation in training samples. J Am Stat Assoc 79:899–906
Chittineni CB (1982) Learning with imperfectly labeled patterns. Pattern Recog 12(5):169–191
Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, Maryland, USA, pp 189–196
Cozman FG, Cohen I, Cirelo MC (2003) Semi-supervised learning of mixture models. In: Proceedings of the 20th international conference on machine learning. Washington DC, USA, pp 99–106
DeSa VR (1994) Learning classification with unlabeled data. In: Advances in neural information processing systems, Denver, USA, vol. 6. MIT Press, Cambridge, pp 112–119
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc 39(1):1–38
Jaakkola T, Meila M, Jebara T (2000) Maximum entropy discrimination. In: Advances in neural information processing systems, Denver, USA, vol. 12. MIT Press, Cambridge, pp 470–477
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Bled, Slovenia, pp 200–209
Krishnan T, Nandy SC (1987) Discriminant analysis with a stochastic suprevisor. Pattern Recog 20(4):379–384
Krishnan T (1988) Efficiency of learning with imperfect supervision. Pattern Recog 21:183–188
Kupiec J, Pederson J, Chen FA (1995) Trainable document summarizer. In: Proceedings of SIGIR-95, 18th ACM international conference on research and development in information retrieval. Seattle, Washington, USA, pp 68–73
Lawrence ND, Schölkopf B (2001) Estimating a kernel Fisher discriminant in the presence of label noise. In: Proceedings of the 18th international conference on machine learning, Massachusetts, USA, pp 306–313
Lachenbrunch PA (1974) Discriminant functions when the initial samples are misclassified. II. Nonrandom misclassification models. Technometrics 16:419–424
McLachlan GJ (1972) Asymptotic results for discriminant analysis when the initial samples are misclassified. Technometrics 14:415–422
McLachlan GJ, Ganesalingam S (1982) Updating a discriminant function in basis of unclassified data. Comm Stat Simulat Comput 11(6):753–767
McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York
Miller D, Uyar H (1997) A mixture of experts classifier with learning based on both labeled and unlabeled data. In: Advances in neural information processing systems, Denver, USA, vol. 9. MIT Press, Cambridge, pp 571–577
Mladenic D, Grobelnik M (1998) Feature selection for classification based on text hierarchy. Working notes of learning from text and the web, conference automated learning and discovery, Carnegie Mellon University, Pittsburgh
Murray GD, Titterington DM (1978) Estimation problems with data from a mixture. Appl Stat 27(3):325–334
Muslea I, Minton S, Knoblock C (2002) Active + semi-supervised learning = robust multi-view learning. In: Proceedings of the 19th international conference on machine learning, Syndey, Australia, pp 435–442
Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2/3):127–163
O'Neill TJ (1978) Normal discrimination with unclassified observations. J Am Stat Assoc 73(364):821–826
Ratsaby J, Venkatesh SS (1995) Learning from a mixture of labeled and unlabeled examples with parametric side information. In: Proceedings of the 8th annual conference on computational learning theory, Santa Cruz, California, USA, pp 412–417
Roth V, Steinhage V (1999) Nonlinear discriminant analysis using kernel functions. In: Advances in neural information processing systems, Denver, USA, vol. 12. MIT Press, Cambridge, pp 568–574
Seeger M (2000) Learning with labeled and unlabeled data. University of Edinburgh, Technical report. www.dai.ed.ac.uk/homes/seeger/papers/review.pdf
SUMMAC (1998) TIPSTER Text Summarization Evaluation Conference. http://ww-nlpir.nist.gov/related_projects/tipster_summac/
Szummer M, Jaakkola T (2001) Kernel expansions with unlabeled examples. Advances in Neural Information Processing Systems 13, Vancouver, British Columbia, Canada, pp 626–632
Symons MJ (1981) Clustering criteria and multivariate normal mixture. Biometrics 37(1):35–43
Tibshirani R (1996) A comparison of some error estimates for neural networks models. Neural Comput 8:182–163
Titterington DM (1989) An alternative stochastic supervisor in discriminant analysis. Pattern Recog 22(1):91–95
Vittaut JN, Amini MR, Gallinari P (2002) Learning classification with both labeled and unlabeled data. In: Proceedings of the 13th european conference on machine learning, Helsinki, Finland, pp 468–479
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning, Washington DC, USA, pp 912–919
Author information
Authors and Affiliations
Corresponding author
Additional information
Massih R. Amini is currently assistant professor in the University of Pierre and Marie Curie (Paris 6). He received an engineering degree in computer science from the Ecole Supérieure d'Informatique (Computer science engineering school) in Paris in 1995. He then accomplished his master thesis in science in artificial intelligence in 1997 and obtained his PhD in 2001 at University of Pierre and Marie Curie. His research interests include Statistical Learning and Text-Mining.
Patrick Gallinari is currently professor in the University of Pierre and Marie Curie (Paris 6) and head of the Computer Science laboratory (LIP6). His main research activity has been in the field of statistical machine learning for the last 15 years. He has also contributed in developing machine learning techniques for different application domains like information retrieval and text mining, user modelling, man–machine interaction and pen interfaces.
Rights and permissions
About this article
Cite this article
Amini, M.R., Gallinari, P. Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8, 385–413 (2005). https://doi.org/10.1007/s10115-005-0219-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0219-4