Computational Complexity of Probabilistic Disambiguation

Sima'an, Khalil

doi:10.1023/A:1016340700671

Computational Complexity of Probabilistic Disambiguation

Published: August 2002

Volume 5, pages 125–151, (2002)
Cite this article

Grammars

Khalil Sima'an¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recent models of natural language processing employ statistical reasoning for dealing with the ambiguity of formal grammars. In this approach, statistics, concerning the various linguistic phenomena of interest, are gathered from actual linguistic data and used to estimate the probabilities of the various entities that are generated by a given grammar, e.g., derivations, parse-trees and sentences. The extension of grammars with probabilities makes it possible to state ambiguity resolution as a constrained optimization formula, which aims at maximizing the probability of some entity that the grammar generates given the input (e.g., maximum probability parse-tree given some input sentence). The implementation of these optimization formulae in efficient algorithms, however, does not always proceed smoothly. In this paper, we address the computational complexity of ambiguity resolution under various kinds of probabilistic models. We provide proofs that some, frequently occurring problems of ambiguity resolution are NP-complete. These problems are encountered in various applications, e.g., language understanding for text- and speech-based applications. Assuming the common model of computation, this result implies that, for many existing probabilistic models it is not possible to devise tractable algorithms for solving these optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barton, G. E., R. Berwick and E. S. Ristad. Computational Complexity and Natural Language, A Bradford Book, MIT Press, Cambridge, MA, 1987.
Google Scholar
E., F. Jelinek, J. Lafferty, D. Magerman, R. Mercer and S. Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings ACL'93, Columbus, Ohio, 1993.
Bod, R. A computational model of language performance: Data Oriented Parsing. In Proceedings COLING'92, Nantes, 1992.
Bod, R. Monte Carlo Parsing. In Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy, 1993.
Bod, R. Enriching Linguistics with Statistics: Performance models of Natural Language. PhD thesis, ILLC-Dissertation Series 1995-14, University of Amsterdam, 1995a.
Bod, R. The problem of computing the most probable tree in data-oriented parsing and stochastic tree grammars. In Proceedings Seventh Conference of The European Chapter of the ACL, Dublin, 1995b.
Bod, R. and R. Kaplan. A probabilistic corpus-driven approach for Lexical Functional Grammar. In Proceedings COLING-ACL'98, Montréal, Canada, 1998.
Cancedda, N. and C. Samuelsson. Experiments with corpus-based LFG specialization. In Proceedings Sixth Applied Natural Language Processing Conference (ANLP 2000), Seattle, Washington, 2000.
Caraballo, S. and E. Charniak. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics, 24-2: 275–298, 1998.
Google Scholar
Charniak, E. Tree-bank Grammars. In Proceedings AAAI'96, Portland, Oregon, 1996.
Charniak, E. A maximum-entropy-inspired parser. In Report CS-99-12, Providence, Rhode Island, 1999.
Chiang, D. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proceedings 38^th Annual Meeting of the Association for Computational Linguistics (ACL'00), 456–463, Hong Kong, China, 2000.
Chomsky, N. Aspects of the Theory of Syntax. MIT Press, Cambridge Massachusetts, 1965.
Google Scholar
Collins, M. A new statistical parser based on bigram lexical dependencies. In Proceedings 34 ^th Annual Meeting of the ACL, 184–191, 1996.
Collins, M. Three generative, lexicalized models for statistical parsing. In Proceedings 35 ^th Annual Meeting of the ACL and 8 ^th Conference EACL, 16–23, Madrid, Spain, 1997.
Davis, M. and E. Weyuker. Computability, Complexity and Languages: Fundamentals of Theoretical Computer Science. Series in Computer Science and Applied Mathematics, Academic Press, New York, 1983.
Google Scholar
Garey, M. and D. Johnson, Computers and Intractability. W.H. Freeman and Co, San Francisco, 1981.
Google Scholar
Goodman, J. Parsing Inside-Out. PhD thesis, Department of Computer Science, Harvard University, Cambridge, Massachusetts, 1998.
Google Scholar
Hopcroft, J. and J. Ullman. Introduction to Automata Theory, Lanaguges, and Computation, Addison Wesley, Reading, Massachusetts, 1979.
Google Scholar
Jelinek, F., J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings 1994 Human Language Technology Workshop. DARPA, 1994.
Jelinek, F., J. Lafferty and R. Mercer. Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684). Yorktown Heights, 1990.
Joshi, A. Tree adjoining grammars: How much context sensitivity is required to provide a reasonable structural description. In D. Dowty, L. Karttunen and A. Zwicky, editors, Natural Language Parsing, 206–250, Cambridge University Press, Cambridge, 1985.
Google Scholar
Lewis, H. and C. Papadimitriou. Elements of the Theory of Computation, Prentice-Hall, Englewood-Cliffs, NJ, 1981.
Google Scholar
Li, M. and P. Vitányi. An Introduction to Kolomogorov Complexity and Its Applications, 2nd edition, Springer, Berlin, 1997.
Google Scholar
Magerman, D.M. Statistical decision-tree models for parsing. In Proceedings 33 ^rd Annual Meeting of the ACL, 1995.
Marcus, M., B. Santorini and M. Marcinkiewicz. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19: 313–330, 1993.
Google Scholar
Martin, W., K. Church and R. Patil. Preliminary analysis of a Breadth-First Parsing Algorithm: Theoretical and Experimental Results. In Bolc, L., editor, Natural Language Parsing Systems, 267–328, Springer, Berlin, 1987.
Google Scholar
Oeder, M. and H. Ney. Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, volume 2, 119–122, 1993.
Google Scholar
Pereira, F. and Y. Schabes. Inside-outside reestimation from partially bracketed corpora. In Proceedings 30 ^th Annual Meeting of the ACL, Newark, Delaware, 1992.
Rabiner, L.R. and B.H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine: 4–15, 1986.
Ratnaparkhi, A. A linear observed time statistical parser based on maximum entropy models. In Proceedings of Empirical Methods in NLP, EMNLP-2, 1–10, 1997.
Resnik, P. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In Proceedings COLING'92, Nantes, 1992.
Rissanen, J. A universal prior for integers and estimation by minimum description length, The Annuals of Statistics, 11 (2): 416–431, 1983.
Google Scholar
Salomaa, A. Probabilistic and weighted grammars, Information and Control, 15: 529–544, 1969.
Google Scholar
Samuelsson, C. Fast Natural-Language Parsing Using Explanation-Based Learning, Swedish Institute of Computer Science Dissertation Series 13, Stockholm, Sweden, 1994.
Scha, R. Language theory and language technology; competence and performance. In de Kort, Q. and Leerdam, G., editors, Computertoepassingen in de Neerlandistiek, Almere: LVVN-jaarboek (can be obtained from http://www.hum.uva.nl/computerlinguistiek /scha/IAAA/rs/cv.html#Linguistics), 1990.
Google Scholar
Schabes, Y. Stochastic lexicalized tree-adjoining grammars. In Proceedings COLING'92, Nantes, 1992.
Schabes, Y. and R. Waters. Stochastic lexicalized context-free grammar. In Proceedings Third IWPT, Tilburg/Durbuy, 1993.
Sekine, S. and Grishman, R. A corpus-based probabilistic grammar with only two non-terminals. In Proceedings Fourth International Workshop on Parsing Technologies, Prague, 1995.
Sima'an, K. Learning Efficient Disambiguation. PhD dissertation. ILLC Dissertation Series 1999-02 (Utrecht University / University of Amsterdam), Amsterdam, 1999.
Google Scholar
Sima'an, K. Tree-gram parsing: lexical dependencies and structural relations. In Proceedings 38^th Annual Meeting of the Association for Computational Linguistics (ACL'00), 53–60, Hong Kong, China, 2000.
Sima'an, K., A. Itai, Y. Winter, A. Altman and N. Nativ. Building a tree-bank of modern Hebrew texts. Traitement Automatique des Langues, Special Issue on Natural Language Processing and Corpus Linguistics (Béatrice Daille and Laurent Romary eds.), 42(2), 2001.
van Noord, G. The intersection of finite state automata and definite clause grammars. In Proceedings ACL-95, 1995.

Download references

Author information

Authors and Affiliations

Institute for Logic, Language and Computation (ILLC), University of Amsterdam, Room B.234, Nieuwe Achtergracht 166, 1018 WV, Amsterdam, The Netherlands
Khalil Sima'an

Authors

Khalil Sima'an
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sima'an, K. Computational Complexity of Probabilistic Disambiguation. Grammars 5, 125–151 (2002). https://doi.org/10.1023/A:1016340700671

Download citation

Issue Date: August 2002
DOI: https://doi.org/10.1023/A:1016340700671

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational Complexity of Probabilistic Disambiguation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Ambiguity Hierarchy of Weighted Context-Free Grammars

From Ambiguous Regular Expressions to Deterministic Parsing Automata

The Complexity of Inferences and Explanations in Probabilistic Logic Programming

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Computational Complexity of Probabilistic Disambiguation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Ambiguity Hierarchy of Weighted Context-Free Grammars

From Ambiguous Regular Expressions to Deterministic Parsing Automata

The Complexity of Inferences and Explanations in Probabilistic Logic Programming

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now