Abstract
In recent years, there has been growing interest in applying techniques that incorporate knowledge from unlabeled data into systems performing supervised learning. However, disparate results have been presented in the literature, and there is no general consensus that the use of unlabeled examples should always improve classifier performance. This paper proposes a method for incorporating a corpus of unlabeled examples into the supervised training of a neural network classifier and presents results from applying the technique to several datasets from the UCI repository. While the results do not provide support for the claim that unlabeled data can improve overall classification accuracy, a bias-variance decomposition shows that classifiers trained with unlabeled data display lower bias and higher variance than classifiers trained using labeled data alone.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dempster, A.P., Laird, N.M. and Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, (1977), pp. 1–38.
Ghahramani, Z. & Jordan, I.: Supervised learning from incomplete data via an EM approach, in Advances in Neural Information Processing Systems 6. J.D. Cowan, G. Tesauro and J. Alspector (eds). Morgan Kaufmann Publishers, San Francisco, CA, (1994).
Nigam, K., McCallum, A.K., Thrun, S, & Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning, 39, (2000) pp. 103–134.
Blum, A. & Mitchell, T.: Combining labeled and unlabeled data with co-training, Proceeding of the Eleventh ANNUAL Conference on Computational Learning Theory (1998) pp. 92–100.
Goldman, S, and Zhou, Y.: Enhancing supervised learning with unlabeled data, Proceedings of International Conference on Machine Learning ICML 2000, (2000).
Vapnik, V.: Statistical Learning Theory. Wiley, (1998).
Jaakkola, T., Meila, M. & Jebara, T.: Maximum Entropy Discrimination, in nips, vol. 12, (1999), pp 470–476.
Shahshahani, B.M. and Landgrebe, D.A.: The effect of unlabeled samples in reducing the small size problem and mitigating the Hughes phenomenon, IEEE Transactions on Geoscience and Remote Sensing, 32(5), (1994) pp 1087–1095.
Baluja, S.: Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data, Neural and Information Processing Systems (NIPS) (1998).
Cozman, F.G and Cohen, I.: Unlabeled Data Can Degrade Classification Performance of Generative Classifiers, HP Labs Technical Report HPL-2001-234 (2001).
Richard, M.D. and Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities, Neural Computation, 3(4) (1991) pp. 461–483.
White, H.: Learning in artificial neural networks: a statistical perspective. Neural Computation 1(4), (1989), pp. 425–464.
Tarassenko, L., Hayton, P. & Brady, M.: Novelty detection for the identification of masses in mammograms, Proc. Fourth International IEEE Conference on Artificial Neural Networks, vol. 409, (1995) pp. 442–447.
Parra, L., Deco, G. & Miesbach, S.: Statistical independence and novelty detection with information preserving nonlinear maps, Neural Computation, vol. 8, (1996), pp. 260–269.
Duda, R.O. & Hart, P.E.: Pattern Recognition and Scene Analysis, John Wiley & Sons, New York, (1973).
Skabar, A.: Single-class classifier learning using neural networks: extracting context from unlabeled data, Artificial Intelligence and Applications (AIA2002), Malaga, Spain, 2002.
Bishop, C.: Neural Networks for Pattern Recognition, Oxford University Press, Oxford, (1995).
Geman, S., Bienenstock, E. & Doursat, R.: Neural Networks and the Bias/Variance Dilemma, Neural Computation, Vol. 4, (1992) pp. 1–58.
Kohavi, R. & Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions, Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, (1996), pp. 275–283.
Breiman, L.: Bias, variance, and Arcing Classifiers. Technical Report 444486, Statistics Department, University of California, Berkeley, CA, (1996).
Kong, E.B. and Dietterich, T.G.: Error-correcting output coding corrects bias and variance, Proceedings of the 12th International Conference on Machine Learning, Tahoe City, CA, (1995) pp. 313–321.
Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality, Data Mining and Knowledge Discovery, Vol. 1, No. 1, Kluwer Academic Publishers. (1997) pp 55–77
Seeger, M.: Learning with labeled and unlabeled data. Technical Report, Institute of Adaptive and Neural Computation, University of Edinburgh, Edinburgh, UK, (2001).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Skabar, A. (2002). Augmenting Supervised Neural Classifier Training Using a Corpus of Unlabeled Data. In: Jarke, M., Lakemeyer, G., Koehler, J. (eds) KI 2002: Advances in Artificial Intelligence. KI 2002. Lecture Notes in Computer Science(), vol 2479. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45751-8_12
Download citation
DOI: https://doi.org/10.1007/3-540-45751-8_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44185-4
Online ISBN: 978-3-540-45751-0
eBook Packages: Springer Book Archive