Skip to main content

Maximum Entropy Models for Natural Language Processing

  • Reference work entry
Encyclopedia of Machine Learning
  • 297 Accesses

Synonyms

Log-linear models; Maxent models; Statistical natural language processing

Definition

The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.

The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes, 1957).

In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.

    Google Scholar 

  • Borthwick, A. (1999). A maximum entropy approach to named entity recognition. Unpublished doctoral dissertation, New York University.

    Google Scholar 

  • Chen, S., & Rosenfeld, R. (1999). A Gaussian prior for smoothing maximum entropy models (Tech. Rep. No. CMUCS-99-108). Carnegie Mellon University.

    Google Scholar 

  • Church, K. W., & Mercer, R. L. (1993). Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 19(1), 1–24.

    Google Scholar 

  • Curran, J., & Clark, S. (2003). Investigating GIS and smoothing for maximum entropy taggers. In Proceedings of the 11th annual meeting of the european chapter of the association for computational linguistics (EACL’03) (pp. 91–98). Budapest, Hungary.

    Google Scholar 

  • Darroch, J., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Statistics, 43(5), 1470– 1480.

    MATH  MathSciNet  Google Scholar 

  • Goodman, J. (2002). Sequential conditional generalized iterative scaling. In Proceedings of the Association for Computational Linguistics (ACL) (pp. 9–16). Philadelphia, Pennsylvania.

    Google Scholar 

  • Ittycheriah, A., Franz, M., Zhu, W., & Ratnaparkhi, A. (2001). Question answering using maximum-entropy components. In Proceedings of the North American association for computational linguistics. (NAACL), Pittsburgh, Pennsylvania.

    Google Scholar 

  • Jaynes, E. T. (1957, May). Information theory and statistical mechanics. Physical Review, 106(4), 620–630.

    MathSciNet  Google Scholar 

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings 18th international conference on machine learning (pp. 282–289). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Lau, R., Rosenfeld, R., & Roukos, S. (1993). Adaptive language modeling using the maximum entropy principle. In Proceedings of the ARPA Human Language Technology Workshop (pp. 108–113). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Sixth conference on natural language learning (CoNLL) (pp. 49–55). Taipei, Taiwan.

    Google Scholar 

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of english: “The Penn Treebank”. Computational Linguistics, 19(2), 313-330.

    Google Scholar 

  • Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing (pp. 133–142). Somerset, NJ: Association for Computational Linguistics.

    Google Scholar 

  • Ratnaparkhi, A. (1999). Learning to parse natural language with maximum entropy models. Machine Learning, 34(1–3), 151–175.

    MATH  Google Scholar 

  • Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of the human language technology conference (HLT-NAACL) (pp. 213–220). Edmonton, Canada.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Ratnaparkhi, A. (2011). Maximum Entropy Models for Natural Language Processing. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_520

Download citation

Publish with us

Policies and ethics