Typical Example Selection for Learning Classifiers

Han, Jianchao; Cercone, Nick

doi:10.1007/3-540-45486-1_29

Jianchao Han² &
Nick Cercone²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1822))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

563 Accesses

Abstract

We consider the problem of learning classifiers with a small labeled example set and a large unlabeled example set. This situation arises in many applications, e.g., identifying medical images, webpages, sensing data, etc. where it is hard and expensive to label the examples while it is much easier to acquire unlabeled examples. We suppose that the training data is distributed in the mixture model with Gaussian components. An approach to selecting typical examples for learning classifiers is proposed, and the typicality measure is defined with respect to the labeled data according to the Mahalanobis squared distance. The algorithm for selecting typical examples is described. The basic idea is that a training example is randomly drawn, and its typicality is measured. If the typicality is greater than the threshold, then the training example is sampled. The number of typical examples sampled is limited to memory capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, K. C. Sevcik, The New Jersey Data Reduction Report, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp.3–45, 1997.
Google Scholar
A. Blum, T. Mitchell, Combining Labeled and Unlabeled Data with Co-Training, Proc. of the 11th Annual Conf. on Computational Learning Theory, pp. 92–100, ACM press, New York, 1998
Google Scholar
L. Breiman, Pasting bites together for prediction in large data sets and on-line, Technical report, Statistics Department, University of California, Berkeley, CA, 1997.
Google Scholar
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, pp.1–38, 1977.
Google Scholar
K. Fukunaga, J. M. Mantock, Nonparametric Data Reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, No.1, pp.115–118, 1984.
Article Google Scholar
Z. Ghahramani, M. Jordan, Supervised learning from incomplete data via an EM approach, In Advances in Neural Information Processing Systems (NIPS) 6, pp.120–127, 1994.
Google Scholar
D. M. Hawkins, Identification of Outliers, Chapman and Hall, 1980.
Google Scholar
D. M. Hawkins, A new test for multivariate normality and homoscedasticity, Technometrics 23, pp.105–110, 1981.
Article MATH MathSciNet Google Scholar
H. Liu, H. Motoda (Eds.), Feature Extraction Construction and Selection: A Data Mining Perspective, Kluwer Academic Publishers, 1998.
Google Scholar
G. J. McLachlan, K. E. Basford, MIXTURE MODELS: Inference and Applications to Clustering, Marcel Dekker, Inc. 1988.
Google Scholar
D. J. Miller, H. S. Uyar, A mixture of experts classifier with learning based on both labeled and unlabeled data, In Advances in Neural Information Processing Systems (NIPS 9), 1997.
Google Scholar
T. Mitchell, Machine Learning, McGraw-Hill, 1997.
Google Scholar
B. Shahshahani, D. Landgrebe, The effect of unlabeled samples in reducing the small size problem and mitigating the Hughes phenomenon, IEEE Trans. on Geo-science and Remoting Sensing 32(5), pp.1087–1095, 1994.
Article Google Scholar
Kah-Kay Sung, Learning and Example Selection for Object and Pattern Detection, Dept. of Brain and Cognitive Sciences, MIT, 1996.
Google Scholar
S. K. Thompson, Sampling, John Wiley & Sons, Inc, 1992.
Google Scholar
J. H. Wolfe, Pattern clustering by multivariate mixture analysis, Multivariate Behavioral Research 5, pp.329–350, 1970.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
Jianchao Han & Nick Cercone

Authors

Jianchao Han
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada
Howard J. Hamilton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, J., Cercone, N. (2000). Typical Example Selection for Learning Classifiers. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_29

Download citation

DOI: https://doi.org/10.1007/3-540-45486-1_29
Published: 19 May 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67557-0
Online ISBN: 978-3-540-45486-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics