Maximum Entropy Distribution Estimation with Generalized Regularization

Dudík, Miroslav; Schapire, Robert E.

doi:10.1007/11776420_12

Miroslav Dudík²⁰ &
Robert E. Schapire²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

International Conference on Computational Learning Theory

3035 Accesses

Abstract

We present a unified and complete account of maximum entropy distribution estimation subject to constraints represented by convex potential functions or, alternatively, by convex regularization. We provide fully general performance guarantees and an algorithm with a complete convergence proof. As special cases, we can easily derive performance guarantees for many known regularization types, including ℓ₁, ℓ₂, \(\ell_{\rm 2}^{\rm 2}\) and ℓ₁ + \(\ell_{\rm 2}^{\rm 2}\) style regularization. Furthermore, our general approach enables us to use information about the structure of the feature space or about sample selection bias to derive entirely new regularization functions with superior guarantees. We propose an algorithm solving a large and general subclass of generalized maxent problems, including all discussed in the paper, and prove its convergence. Our approach generalizes techniques based on information geometry and Bregman divergences as well as those based more directly on compactness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Maximum entropy on the mean and the Cramér rate function in statistical estimation and inverse problems: properties, models, and algorithms

Article 04 January 2025

The B-exponential divergence and its generalizations with applications to parametric estimation

Article 17 November 2018

3D Insights to Some Divergences for Robust Statistics and Machine Learning

References

Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
Article MathSciNet Google Scholar
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Della Pietra, S., Della Pietra, V., Lafferty, J.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 1–13 (1997)
Article Google Scholar
Phillips, S.J., Dudík, M., Schapire, R.E.: A ME approach to species distribution modeling. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
Lau, R.: Adaptive statistical language modeling. Master’s thesis, MIT Department of Electrical Engineering and Computer Science (1994)
Google Scholar
Chen, S.F., Rosenfeld, R.: A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing 8(1), 37–50 (2000)
Article Google Scholar
Lebanon, G., Lafferty, J.: Boosting and maximum likelihood for exponential models. Technical Report CMU-CS-01-144, CMU School of Computer Science (2001)
Google Scholar
Zhang, T.: Class-size independent generalization analysis of some discriminative multi-category classification. Advances in Neural Information Processing Systems 17 (2005)
Google Scholar
Goodman, J.: Exponential priors for maximum entropy models. In: Conference of the North American Chapter of the Association for Computational Linguistics (2004)
Google Scholar
Kazama, J., Tsujii, J.: Evaluation and extension of ME models with inequality constraints. In: Conference on Empirical Methods in Natural Language Processing, pp. 137–144 (2003)
Google Scholar
Dudík, M., Phillips, S.J., Schapire, R.E.: Performance guarantees for regularized maximum entropy density estimation. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 472–486. Springer, Heidelberg (2004)
Chapter Google Scholar
Altun, Y., Smola, A.J.: Unifying divergence minimization and statistical inference via convex duality. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 139–153. Springer, Heidelberg (2006)
Chapter Google Scholar
Dudík, M., Schapire, R.E., Phillips, S.J.: Correcting sample selection bias in ME density estimation. Advances in Neural Information Processing Systems 18 (2006)
Google Scholar
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1), 253–285 (2002)
Article MATH Google Scholar
Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)
Article MathSciNet MATH Google Scholar
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the Sixth Conference on Natural Language Learning, pp. 49–55 (2002)
Google Scholar
Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 957–968 (2005)
Article Google Scholar
Ng, A.Y.: Feature selection, L ₁ vs. L ₂ regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
Newman, W.: Extension to the ME method. IEEE Trans. on Inf. Th. IT-23(1), 89–93 (1977)
Article Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, 08540
Miroslav Dudík & Robert E. Schapire

Authors

Miroslav Dudík
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Schapire
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICREA and Department of Economics, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dudík, M., Schapire, R.E. (2006). Maximum Entropy Distribution Estimation with Generalized Regularization. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_12

Download citation

DOI: https://doi.org/10.1007/11776420_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Maximum Entropy Distribution Estimation with Generalized Regularization

Abstract

Access this chapter

Preview

Similar content being viewed by others

Maximum entropy on the mean and the Cramér rate function in statistical estimation and inverse problems: properties, models, and algorithms

The B-exponential divergence and its generalizations with applications to parametric estimation

3D Insights to Some Divergences for Robust Statistics and Machine Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Maximum Entropy Distribution Estimation with Generalized Regularization

Abstract

Access this chapter

Preview

Similar content being viewed by others

Maximum entropy on the mean and the Cramér rate function in statistical estimation and inverse problems: properties, models, and algorithms

The B-exponential divergence and its generalizations with applications to parametric estimation

3D Insights to Some Divergences for Robust Statistics and Machine Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation