Metric entropy and minimax risk in classification

Haussler, David; Opper, Manfred

doi:10.1007/3-540-63246-8_13

Metric entropy and minimax risk in classification

David Haussler¹ &
Manfred Opper²

Pattern Matching and Learning
Chapter
First Online: 01 January 2005

152 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1261))

Abstract

We apply recent results on the minimax risk in density estimation to the related problem of pattern classification. The notion of loss we seek to minimize is an information theoretic measure of how well we can predict the classification of future examples, given the classification of previously seen examples. We give an asymptotic characterization of the minimax risk in terms of the metric entropy properties of the class of distributions that might be generating the examples. We then use these results to characterize the minimax risk in the special case of noisy two-valued classification problems in terms of the Assouad density and the Vapnik-Chervonenkis dimension.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2(4):343–370, 1988.
Google Scholar
P. Assouad. Densité et dimension. Annales de l'Institut Fourier, 33(3):233–282, 1983.
Google Scholar
A. Barron. In T. M. Cover and B. Gopinath, editors, Open Problems in Communication and Computation, chapter 3.20. Are Bayes rules consistent in information?, pages 85–91. 1987.
Google Scholar
A. Barron. The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical Report 7, Dept. of Statistics, U. III. Urbana-Champaign, 1987.
Google Scholar
A. Barron, B. Clarke, and D. Haussler. Information bounds for the risk of Bayesian predictions and the redundancy of universal codes. In Proc. International Symposium on Information Theory.
Google Scholar
A. Barron and Y. Yang. Information theoretic lower bounds on convergence rates of nonparametric estimators, 1995. unpublished manuscript.
Google Scholar
L. Birgé. Approximation dans les espaces métriques et théorie de l'estimation. Zeitschrift fuer Wahrscheinlichkeitstheorie und verwandte gebiete, 65:181–237, 1983.
Google Scholar
L. Birgé. On estimating a density using Hellinger distance and some other strange facts. Probability theory and related fields, 71:271–291, 1986.
Google Scholar
L. Birgé and P. Massart. Rates of convergence for minimum contrast estimators. Probability Theory and Related Fields, 97:113–150, 1993.
Google Scholar
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam's razor. Information Processing Letters, 24:377–380, 1987.
Google Scholar
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965, 1989.
Google Scholar
B. Clarke. Asymptotic cumulative risk and Bayes risk under entropy loss with applications. PhD thesis, Dept. of Statistics, University of Ill., 1989.
Google Scholar
B. Clarke and A. Barron. Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory, 36(3):453–471, 1990.
Google Scholar
B. Clarke and A. Barron. Jefferys' prior is asymptotically least favorable under entropy risk. J. Statistical Planning and Inference, 41:37–60, 1994.
Google Scholar
G. F. Clements. Entropy of several sets of real-valued functions. Pacific J. Math., 13:1085–1095, 1963.
Google Scholar
T. Cover and J. Thomas. Elements of Information Theory. Wiley, 1991.
Google Scholar
L. Devroye and L. Györfi. Nonparametric density estimation, the L ₁ view. Wiley, 1985.
Google Scholar
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
Google Scholar
R. M. Dudley. A course on empirical processes. Lecture Notes in Mathematics, 1097:2–142, 1984.
Google Scholar
S. Y. Efroimovich. Information contained in a sequence of observations. Problems in Information Transmission, 15:178–189, 1980.
Google Scholar
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247–261, 1989.
Google Scholar
M. Feder, Y. Freund, and Y. Mansour. Optimal universal learning and prediction of probabilistic concepts. In Proc. of IEEE Information Theory Conference, page 233. IEEE, 1995.
Google Scholar
A. Gelman. Bayesian Data Analysis. Chapman and Hall, NY, 1995.
Google Scholar
E. Giné and J. Zinn. Some limit theorems for empirical processes. Annals of Probability, 12:929–989, 1984.
Google Scholar
R. Hasminskii and I. Ibragimov. On density estimation in the view of Kolmogorov's ideas in approximation theory. Annals of statistics, 18:999–1010, 1990.
Google Scholar
D. Haussler and A. Barron. How well do Bayes methods work for on-line prediction of +1,−1 values? In Proceedings of the Third NEC Symposium on Computation and Cognition. SIAM, 1992.
Google Scholar
D. Haussler and M. Opper. General bounds on the mutual information between a parameter and n conditionally independent observations. In Proceedings of the Seventh Annual ACM Workshop on Computational Learning Theory, 1995.
Google Scholar
D. Haussler and M. Opper. Mutual information, metric entropy, and risk in estimation of probability distributions. Technical Report UCSC-CRL-96-27, Univ. of Calif. Computer Research Lab, Santa Cruz, CA, 1996.
Google Scholar
I. Ibragimov and R. Hasminskii. On the information in a sample about a parameter. In Second Int. Symp. on Information Theory, pages 295–309, 1972.
Google Scholar
A. J. Izenman. Recent developments in nonparametric density estimation. JASA, 86(413):205–224, 1991.
Google Scholar
A. N. Kolmogorov and V. M. Tihomirov. ∈-entropy and ∈-capacity of sets in functional spaces. Amer. Math. Soc. Translations (Ser. 2), 17:277–364, 1961.
Google Scholar
L. LeCam. Asymptotic methods in statistical decision theory. Springer, 1986.
Google Scholar
G. Lorentz. Approxiamtion of Functions. Holt, Rinehart, Winston, 1966.
Google Scholar
R. Meir and N. Merhav. On the stochastic complexity of learning realizable and unrealizable rules. Unpublished manuscript, 1994.
Google Scholar
M. Opper and D. Haussler. Bounds for predictive errors in the statistical mechanics of in supervised learning. Physical Review Letters, 75(20):3772–3775, 1995.
Google Scholar
D. Pollard. Empirical Processes: Theory and Applications, volume 2 of NSF-CBMS Regional Conference Series in Probability and Statistics. Institute of Math. Stat. and Am. Stat. Assoc., 1990.
Google Scholar
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080–1100, 1986.
Google Scholar
J. Rissanen, T. Speed, and B. Yu. Density estimation by stochastic complexity. IEEE Trans. Info. Th., 38:315–323, 1992.
Google Scholar
N. Sauer. On the density of families of sets. Journal of Combinatorial Theory (Series A), 13:145–147, 1972.
Google Scholar
L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–42, 1984.
Google Scholar
S. van deGeer. Hellinger-consistency of certain nonparametric maximum likelihood estimators. Annals of Statistics, 21:14–44, 1993.
Google Scholar
A. van der Vaart and J. Wellner. Weak Convergence and Empirical Processes. Springer, NY, 1996.
Google Scholar
V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.
Google Scholar
V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–80, 1971.
Google Scholar
W. Wong and X. Shen. Probability inequalities for likelihood ratios and convergence rates for sieve MLE's. Annals of Statistics, 23(2):339–362, 1995.
Google Scholar
B. Yu. Lower bounds on expected redundancy for nonparametric classes. IEEE Trans. Info. Th., 42(1), 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science, 95064, UC Santa Cruz, CA, USA
David Haussler
Dept. of Physics, Universität Würzburg, Germany
Manfred Opper

Authors

David Haussler
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Opper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan Mycielski Grzegorz Rozenberg Arto Salomaa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Haussler, D., Opper, M. (1997). Metric entropy and minimax risk in classification. In: Mycielski, J., Rozenberg, G., Salomaa, A. (eds) Structures in Logic and Computer Science. Lecture Notes in Computer Science, vol 1261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63246-8_13

Download citation

DOI: https://doi.org/10.1007/3-540-63246-8_13
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63246-7
Online ISBN: 978-3-540-69242-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics