Skip to main content

Typical Example Selection for Learning Classifiers

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1822))

  • 563 Accesses

Abstract

We consider the problem of learning classifiers with a small labeled example set and a large unlabeled example set. This situation arises in many applications, e.g., identifying medical images, webpages, sensing data, etc. where it is hard and expensive to label the examples while it is much easier to acquire unlabeled examples. We suppose that the training data is distributed in the mixture model with Gaussian components. An approach to selecting typical examples for learning classifiers is proposed, and the typicality measure is defined with respect to the labeled data according to the Mahalanobis squared distance. The algorithm for selecting typical examples is described. The basic idea is that a training example is randomly drawn, and its typicality is measured. If the typicality is greater than the threshold, then the training example is sampled. The number of typical examples sampled is limited to memory capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, K. C. Sevcik, The New Jersey Data Reduction Report, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp.3–45, 1997.

    Google Scholar 

  2. A. Blum, T. Mitchell, Combining Labeled and Unlabeled Data with Co-Training, Proc. of the 11th Annual Conf. on Computational Learning Theory, pp. 92–100, ACM press, New York, 1998

    Google Scholar 

  3. L. Breiman, Pasting bites together for prediction in large data sets and on-line, Technical report, Statistics Department, University of California, Berkeley, CA, 1997.

    Google Scholar 

  4. A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, pp.1–38, 1977.

    Google Scholar 

  5. K. Fukunaga, J. M. Mantock, Nonparametric Data Reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, No.1, pp.115–118, 1984.

    Article  Google Scholar 

  6. Z. Ghahramani, M. Jordan, Supervised learning from incomplete data via an EM approach, In Advances in Neural Information Processing Systems (NIPS) 6, pp.120–127, 1994.

    Google Scholar 

  7. D. M. Hawkins, Identification of Outliers, Chapman and Hall, 1980.

    Google Scholar 

  8. D. M. Hawkins, A new test for multivariate normality and homoscedasticity, Technometrics 23, pp.105–110, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  9. H. Liu, H. Motoda (Eds.), Feature Extraction Construction and Selection: A Data Mining Perspective, Kluwer Academic Publishers, 1998.

    Google Scholar 

  10. G. J. McLachlan, K. E. Basford, MIXTURE MODELS: Inference and Applications to Clustering, Marcel Dekker, Inc. 1988.

    Google Scholar 

  11. D. J. Miller, H. S. Uyar, A mixture of experts classifier with learning based on both labeled and unlabeled data, In Advances in Neural Information Processing Systems (NIPS 9), 1997.

    Google Scholar 

  12. T. Mitchell, Machine Learning, McGraw-Hill, 1997.

    Google Scholar 

  13. B. Shahshahani, D. Landgrebe, The effect of unlabeled samples in reducing the small size problem and mitigating the Hughes phenomenon, IEEE Trans. on Geo-science and Remoting Sensing 32(5), pp.1087–1095, 1994.

    Article  Google Scholar 

  14. Kah-Kay Sung, Learning and Example Selection for Object and Pattern Detection, Dept. of Brain and Cognitive Sciences, MIT, 1996.

    Google Scholar 

  15. S. K. Thompson, Sampling, John Wiley & Sons, Inc, 1992.

    Google Scholar 

  16. J. H. Wolfe, Pattern clustering by multivariate mixture analysis, Multivariate Behavioral Research 5, pp.329–350, 1970.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Han, J., Cercone, N. (2000). Typical Example Selection for Learning Classifiers. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-45486-1_29

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67557-0

  • Online ISBN: 978-3-540-45486-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics