Abstract
We consider the problem of learning classifiers with a small labeled example set and a large unlabeled example set. This situation arises in many applications, e.g., identifying medical images, webpages, sensing data, etc. where it is hard and expensive to label the examples while it is much easier to acquire unlabeled examples. We suppose that the training data is distributed in the mixture model with Gaussian components. An approach to selecting typical examples for learning classifiers is proposed, and the typicality measure is defined with respect to the labeled data according to the Mahalanobis squared distance. The algorithm for selecting typical examples is described. The basic idea is that a training example is randomly drawn, and its typicality is measured. If the typicality is greater than the threshold, then the training example is sampled. The number of typical examples sampled is limited to memory capacity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, K. C. Sevcik, The New Jersey Data Reduction Report, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp.3–45, 1997.
A. Blum, T. Mitchell, Combining Labeled and Unlabeled Data with Co-Training, Proc. of the 11th Annual Conf. on Computational Learning Theory, pp. 92–100, ACM press, New York, 1998
L. Breiman, Pasting bites together for prediction in large data sets and on-line, Technical report, Statistics Department, University of California, Berkeley, CA, 1997.
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, pp.1–38, 1977.
K. Fukunaga, J. M. Mantock, Nonparametric Data Reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, No.1, pp.115–118, 1984.
Z. Ghahramani, M. Jordan, Supervised learning from incomplete data via an EM approach, In Advances in Neural Information Processing Systems (NIPS) 6, pp.120–127, 1994.
D. M. Hawkins, Identification of Outliers, Chapman and Hall, 1980.
D. M. Hawkins, A new test for multivariate normality and homoscedasticity, Technometrics 23, pp.105–110, 1981.
H. Liu, H. Motoda (Eds.), Feature Extraction Construction and Selection: A Data Mining Perspective, Kluwer Academic Publishers, 1998.
G. J. McLachlan, K. E. Basford, MIXTURE MODELS: Inference and Applications to Clustering, Marcel Dekker, Inc. 1988.
D. J. Miller, H. S. Uyar, A mixture of experts classifier with learning based on both labeled and unlabeled data, In Advances in Neural Information Processing Systems (NIPS 9), 1997.
T. Mitchell, Machine Learning, McGraw-Hill, 1997.
B. Shahshahani, D. Landgrebe, The effect of unlabeled samples in reducing the small size problem and mitigating the Hughes phenomenon, IEEE Trans. on Geo-science and Remoting Sensing 32(5), pp.1087–1095, 1994.
Kah-Kay Sung, Learning and Example Selection for Object and Pattern Detection, Dept. of Brain and Cognitive Sciences, MIT, 1996.
S. K. Thompson, Sampling, John Wiley & Sons, Inc, 1992.
J. H. Wolfe, Pattern clustering by multivariate mixture analysis, Multivariate Behavioral Research 5, pp.329–350, 1970.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, J., Cercone, N. (2000). Typical Example Selection for Learning Classifiers. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_29
Download citation
DOI: https://doi.org/10.1007/3-540-45486-1_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67557-0
Online ISBN: 978-3-540-45486-1
eBook Packages: Springer Book Archive