Abstract
Recent models of natural language processing employ statistical reasoning for dealing with the ambiguity of formal grammars. In this approach, statistics, concerning the various linguistic phenomena of interest, are gathered from actual linguistic data and used to estimate the probabilities of the various entities that are generated by a given grammar, e.g., derivations, parse-trees and sentences. The extension of grammars with probabilities makes it possible to state ambiguity resolution as a constrained optimization formula, which aims at maximizing the probability of some entity that the grammar generates given the input (e.g., maximum probability parse-tree given some input sentence). The implementation of these optimization formulae in efficient algorithms, however, does not always proceed smoothly. In this paper, we address the computational complexity of ambiguity resolution under various kinds of probabilistic models. We provide proofs that some, frequently occurring problems of ambiguity resolution are NP-complete. These problems are encountered in various applications, e.g., language understanding for text- and speech-based applications. Assuming the common model of computation, this result implies that, for many existing probabilistic models it is not possible to devise tractable algorithms for solving these optimization problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Barton, G. E., R. Berwick and E. S. Ristad. Computational Complexity and Natural Language, A Bradford Book, MIT Press, Cambridge, MA, 1987.
E., F. Jelinek, J. Lafferty, D. Magerman, R. Mercer and S. Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings ACL'93, Columbus, Ohio, 1993.
Bod, R. A computational model of language performance: Data Oriented Parsing. In Proceedings COLING'92, Nantes, 1992.
Bod, R. Monte Carlo Parsing. In Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy, 1993.
Bod, R. Enriching Linguistics with Statistics: Performance models of Natural Language. PhD thesis, ILLC-Dissertation Series 1995-14, University of Amsterdam, 1995a.
Bod, R. The problem of computing the most probable tree in data-oriented parsing and stochastic tree grammars. In Proceedings Seventh Conference of The European Chapter of the ACL, Dublin, 1995b.
Bod, R. and R. Kaplan. A probabilistic corpus-driven approach for Lexical Functional Grammar. In Proceedings COLING-ACL'98, Montréal, Canada, 1998.
Cancedda, N. and C. Samuelsson. Experiments with corpus-based LFG specialization. In Proceedings Sixth Applied Natural Language Processing Conference (ANLP 2000), Seattle, Washington, 2000.
Caraballo, S. and E. Charniak. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics, 24-2: 275–298, 1998.
Charniak, E. Tree-bank Grammars. In Proceedings AAAI'96, Portland, Oregon, 1996.
Charniak, E. A maximum-entropy-inspired parser. In Report CS-99-12, Providence, Rhode Island, 1999.
Chiang, D. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proceedings 38th Annual Meeting of the Association for Computational Linguistics (ACL'00), 456–463, Hong Kong, China, 2000.
Chomsky, N. Aspects of the Theory of Syntax. MIT Press, Cambridge Massachusetts, 1965.
Collins, M. A new statistical parser based on bigram lexical dependencies. In Proceedings 34 th Annual Meeting of the ACL, 184–191, 1996.
Collins, M. Three generative, lexicalized models for statistical parsing. In Proceedings 35 th Annual Meeting of the ACL and 8 th Conference EACL, 16–23, Madrid, Spain, 1997.
Davis, M. and E. Weyuker. Computability, Complexity and Languages: Fundamentals of Theoretical Computer Science. Series in Computer Science and Applied Mathematics, Academic Press, New York, 1983.
Garey, M. and D. Johnson, Computers and Intractability. W.H. Freeman and Co, San Francisco, 1981.
Goodman, J. Parsing Inside-Out. PhD thesis, Department of Computer Science, Harvard University, Cambridge, Massachusetts, 1998.
Hopcroft, J. and J. Ullman. Introduction to Automata Theory, Lanaguges, and Computation, Addison Wesley, Reading, Massachusetts, 1979.
Jelinek, F., J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings 1994 Human Language Technology Workshop. DARPA, 1994.
Jelinek, F., J. Lafferty and R. Mercer. Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684). Yorktown Heights, 1990.
Joshi, A. Tree adjoining grammars: How much context sensitivity is required to provide a reasonable structural description. In D. Dowty, L. Karttunen and A. Zwicky, editors, Natural Language Parsing, 206–250, Cambridge University Press, Cambridge, 1985.
Lewis, H. and C. Papadimitriou. Elements of the Theory of Computation, Prentice-Hall, Englewood-Cliffs, NJ, 1981.
Li, M. and P. Vitányi. An Introduction to Kolomogorov Complexity and Its Applications, 2nd edition, Springer, Berlin, 1997.
Magerman, D.M. Statistical decision-tree models for parsing. In Proceedings 33 rd Annual Meeting of the ACL, 1995.
Marcus, M., B. Santorini and M. Marcinkiewicz. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19: 313–330, 1993.
Martin, W., K. Church and R. Patil. Preliminary analysis of a Breadth-First Parsing Algorithm: Theoretical and Experimental Results. In Bolc, L., editor, Natural Language Parsing Systems, 267–328, Springer, Berlin, 1987.
Oeder, M. and H. Ney. Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, volume 2, 119–122, 1993.
Pereira, F. and Y. Schabes. Inside-outside reestimation from partially bracketed corpora. In Proceedings 30 th Annual Meeting of the ACL, Newark, Delaware, 1992.
Rabiner, L.R. and B.H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine: 4–15, 1986.
Ratnaparkhi, A. A linear observed time statistical parser based on maximum entropy models. In Proceedings of Empirical Methods in NLP, EMNLP-2, 1–10, 1997.
Resnik, P. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In Proceedings COLING'92, Nantes, 1992.
Rissanen, J. A universal prior for integers and estimation by minimum description length, The Annuals of Statistics, 11 (2): 416–431, 1983.
Salomaa, A. Probabilistic and weighted grammars, Information and Control, 15: 529–544, 1969.
Samuelsson, C. Fast Natural-Language Parsing Using Explanation-Based Learning, Swedish Institute of Computer Science Dissertation Series 13, Stockholm, Sweden, 1994.
Scha, R. Language theory and language technology; competence and performance. In de Kort, Q. and Leerdam, G., editors, Computertoepassingen in de Neerlandistiek, Almere: LVVN-jaarboek (can be obtained from http://www.hum.uva.nl/computerlinguistiek /scha/IAAA/rs/cv.html#Linguistics), 1990.
Schabes, Y. Stochastic lexicalized tree-adjoining grammars. In Proceedings COLING'92, Nantes, 1992.
Schabes, Y. and R. Waters. Stochastic lexicalized context-free grammar. In Proceedings Third IWPT, Tilburg/Durbuy, 1993.
Sekine, S. and Grishman, R. A corpus-based probabilistic grammar with only two non-terminals. In Proceedings Fourth International Workshop on Parsing Technologies, Prague, 1995.
Sima'an, K. Learning Efficient Disambiguation. PhD dissertation. ILLC Dissertation Series 1999-02 (Utrecht University / University of Amsterdam), Amsterdam, 1999.
Sima'an, K. Tree-gram parsing: lexical dependencies and structural relations. In Proceedings 38th Annual Meeting of the Association for Computational Linguistics (ACL'00), 53–60, Hong Kong, China, 2000.
Sima'an, K., A. Itai, Y. Winter, A. Altman and N. Nativ. Building a tree-bank of modern Hebrew texts. Traitement Automatique des Langues, Special Issue on Natural Language Processing and Corpus Linguistics (Béatrice Daille and Laurent Romary eds.), 42(2), 2001.
van Noord, G. The intersection of finite state automata and definite clause grammars. In Proceedings ACL-95, 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sima'an, K. Computational Complexity of Probabilistic Disambiguation. Grammars 5, 125–151 (2002). https://doi.org/10.1023/A:1016340700671
Issue Date:
DOI: https://doi.org/10.1023/A:1016340700671