Definition
The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.
The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes, 1957).
In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus...
Recommended Reading
Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
Borthwick, A. (1999). A maximum entropy approach to named entity recognition. Unpublished doctoral dissertation, New York University.
Chen, S., & Rosenfeld, R. (1999). A Gaussian prior for smoothing maximum entropy models (Tech. Rep. No. CMUCS-99-108). Carnegie Mellon University.
Church, K. W., & Mercer, R. L. (1993). Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 19(1), 1–24.
Curran, J., & Clark, S. (2003). Investigating GIS and smoothing for maximum entropy taggers. In Proceedings of the 11th annual meeting of the european chapter of the association for computational linguistics (EACL’03) (pp. 91–98). Budapest, Hungary.
Darroch, J., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Statistics, 43(5), 1470– 1480.
Goodman, J. (2002). Sequential conditional generalized iterative scaling. In Proceedings of the Association for Computational Linguistics (ACL) (pp. 9–16). Philadelphia, Pennsylvania.
Ittycheriah, A., Franz, M., Zhu, W., & Ratnaparkhi, A. (2001). Question answering using maximum-entropy components. In Proceedings of the North American association for computational linguistics. (NAACL), Pittsburgh, Pennsylvania.
Jaynes, E. T. (1957, May). Information theory and statistical mechanics. Physical Review, 106(4), 620–630.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings 18th international conference on machine learning (pp. 282–289). San Francisco: Morgan Kaufmann.
Lau, R., Rosenfeld, R., & Roukos, S. (1993). Adaptive language modeling using the maximum entropy principle. In Proceedings of the ARPA Human Language Technology Workshop (pp. 108–113). San Francisco: Morgan Kaufmann.
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Sixth conference on natural language learning (CoNLL) (pp. 49–55). Taipei, Taiwan.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of english: “The Penn Treebank”. Computational Linguistics, 19(2), 313-330.
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing (pp. 133–142). Somerset, NJ: Association for Computational Linguistics.
Ratnaparkhi, A. (1999). Learning to parse natural language with maximum entropy models. Machine Learning, 34(1–3), 151–175.
Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of the human language technology conference (HLT-NAACL) (pp. 213–220). Edmonton, Canada.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Ratnaparkhi, A. (2011). Maximum Entropy Models for Natural Language Processing. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_520
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_520
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering