Maximum Entropy Models for Natural Language Processing

Ratnaparkhi, Adwait

doi:10.1007/978-0-387-30164-8_520

Adwait Ratnaparkhi

297 Accesses

Synonyms

Log-linear models; Maxent models; Statistical natural language processing

Definition

The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.

The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes, 1957).

In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
Google Scholar
Borthwick, A. (1999). A maximum entropy approach to named entity recognition. Unpublished doctoral dissertation, New York University.
Google Scholar
Chen, S., & Rosenfeld, R. (1999). A Gaussian prior for smoothing maximum entropy models (Tech. Rep. No. CMUCS-99-108). Carnegie Mellon University.
Google Scholar
Church, K. W., & Mercer, R. L. (1993). Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 19(1), 1–24.
Google Scholar
Curran, J., & Clark, S. (2003). Investigating GIS and smoothing for maximum entropy taggers. In Proceedings of the 11th annual meeting of the european chapter of the association for computational linguistics (EACL’03) (pp. 91–98). Budapest, Hungary.
Google Scholar
Darroch, J., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Statistics, 43(5), 1470– 1480.
MATH MathSciNet Google Scholar
Goodman, J. (2002). Sequential conditional generalized iterative scaling. In Proceedings of the Association for Computational Linguistics (ACL) (pp. 9–16). Philadelphia, Pennsylvania.
Google Scholar
Ittycheriah, A., Franz, M., Zhu, W., & Ratnaparkhi, A. (2001). Question answering using maximum-entropy components. In Proceedings of the North American association for computational linguistics. (NAACL), Pittsburgh, Pennsylvania.
Google Scholar
Jaynes, E. T. (1957, May). Information theory and statistical mechanics. Physical Review, 106(4), 620–630.
MathSciNet Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings 18th international conference on machine learning (pp. 282–289). San Francisco: Morgan Kaufmann.
Google Scholar
Lau, R., Rosenfeld, R., & Roukos, S. (1993). Adaptive language modeling using the maximum entropy principle. In Proceedings of the ARPA Human Language Technology Workshop (pp. 108–113). San Francisco: Morgan Kaufmann.
Google Scholar
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Sixth conference on natural language learning (CoNLL) (pp. 49–55). Taipei, Taiwan.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of english: “The Penn Treebank”. Computational Linguistics, 19(2), 313-330.
Google Scholar
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing (pp. 133–142). Somerset, NJ: Association for Computational Linguistics.
Google Scholar
Ratnaparkhi, A. (1999). Learning to parse natural language with maximum entropy models. Machine Learning, 34(1–3), 151–175.
MATH Google Scholar
Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of the human language technology conference (HLT-NAACL) (pp. 213–220). Edmonton, Canada.
Google Scholar

Download references

Author information

Authors and Affiliations

Authors

Adwait Ratnaparkhi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Ratnaparkhi, A. (2011). Maximum Entropy Models for Natural Language Processing. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_520

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_520
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics