Skip to main content

Advertisement

Log in

Computational Complexity of Probabilistic Disambiguation

  • Published:
Grammars

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recent models of natural language processing employ statistical reasoning for dealing with the ambiguity of formal grammars. In this approach, statistics, concerning the various linguistic phenomena of interest, are gathered from actual linguistic data and used to estimate the probabilities of the various entities that are generated by a given grammar, e.g., derivations, parse-trees and sentences. The extension of grammars with probabilities makes it possible to state ambiguity resolution as a constrained optimization formula, which aims at maximizing the probability of some entity that the grammar generates given the input (e.g., maximum probability parse-tree given some input sentence). The implementation of these optimization formulae in efficient algorithms, however, does not always proceed smoothly. In this paper, we address the computational complexity of ambiguity resolution under various kinds of probabilistic models. We provide proofs that some, frequently occurring problems of ambiguity resolution are NP-complete. These problems are encountered in various applications, e.g., language understanding for text- and speech-based applications. Assuming the common model of computation, this result implies that, for many existing probabilistic models it is not possible to devise tractable algorithms for solving these optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  • Barton, G. E., R. Berwick and E. S. Ristad. Computational Complexity and Natural Language, A Bradford Book, MIT Press, Cambridge, MA, 1987.

    Google Scholar 

  • E., F. Jelinek, J. Lafferty, D. Magerman, R. Mercer and S. Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings ACL'93, Columbus, Ohio, 1993.

  • Bod, R. A computational model of language performance: Data Oriented Parsing. In Proceedings COLING'92, Nantes, 1992.

  • Bod, R. Monte Carlo Parsing. In Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy, 1993.

  • Bod, R. Enriching Linguistics with Statistics: Performance models of Natural Language. PhD thesis, ILLC-Dissertation Series 1995-14, University of Amsterdam, 1995a.

  • Bod, R. The problem of computing the most probable tree in data-oriented parsing and stochastic tree grammars. In Proceedings Seventh Conference of The European Chapter of the ACL, Dublin, 1995b.

  • Bod, R. and R. Kaplan. A probabilistic corpus-driven approach for Lexical Functional Grammar. In Proceedings COLING-ACL'98, Montréal, Canada, 1998.

  • Cancedda, N. and C. Samuelsson. Experiments with corpus-based LFG specialization. In Proceedings Sixth Applied Natural Language Processing Conference (ANLP 2000), Seattle, Washington, 2000.

  • Caraballo, S. and E. Charniak. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics, 24-2: 275–298, 1998.

    Google Scholar 

  • Charniak, E. Tree-bank Grammars. In Proceedings AAAI'96, Portland, Oregon, 1996.

  • Charniak, E. A maximum-entropy-inspired parser. In Report CS-99-12, Providence, Rhode Island, 1999.

  • Chiang, D. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proceedings 38th Annual Meeting of the Association for Computational Linguistics (ACL'00), 456–463, Hong Kong, China, 2000.

  • Chomsky, N. Aspects of the Theory of Syntax. MIT Press, Cambridge Massachusetts, 1965.

    Google Scholar 

  • Collins, M. A new statistical parser based on bigram lexical dependencies. In Proceedings 34 th Annual Meeting of the ACL, 184–191, 1996.

  • Collins, M. Three generative, lexicalized models for statistical parsing. In Proceedings 35 th Annual Meeting of the ACL and 8 th Conference EACL, 16–23, Madrid, Spain, 1997.

  • Davis, M. and E. Weyuker. Computability, Complexity and Languages: Fundamentals of Theoretical Computer Science. Series in Computer Science and Applied Mathematics, Academic Press, New York, 1983.

    Google Scholar 

  • Garey, M. and D. Johnson, Computers and Intractability. W.H. Freeman and Co, San Francisco, 1981.

    Google Scholar 

  • Goodman, J. Parsing Inside-Out. PhD thesis, Department of Computer Science, Harvard University, Cambridge, Massachusetts, 1998.

    Google Scholar 

  • Hopcroft, J. and J. Ullman. Introduction to Automata Theory, Lanaguges, and Computation, Addison Wesley, Reading, Massachusetts, 1979.

    Google Scholar 

  • Jelinek, F., J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings 1994 Human Language Technology Workshop. DARPA, 1994.

  • Jelinek, F., J. Lafferty and R. Mercer. Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684). Yorktown Heights, 1990.

  • Joshi, A. Tree adjoining grammars: How much context sensitivity is required to provide a reasonable structural description. In D. Dowty, L. Karttunen and A. Zwicky, editors, Natural Language Parsing, 206–250, Cambridge University Press, Cambridge, 1985.

    Google Scholar 

  • Lewis, H. and C. Papadimitriou. Elements of the Theory of Computation, Prentice-Hall, Englewood-Cliffs, NJ, 1981.

    Google Scholar 

  • Li, M. and P. Vitányi. An Introduction to Kolomogorov Complexity and Its Applications, 2nd edition, Springer, Berlin, 1997.

    Google Scholar 

  • Magerman, D.M. Statistical decision-tree models for parsing. In Proceedings 33 rd Annual Meeting of the ACL, 1995.

  • Marcus, M., B. Santorini and M. Marcinkiewicz. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19: 313–330, 1993.

    Google Scholar 

  • Martin, W., K. Church and R. Patil. Preliminary analysis of a Breadth-First Parsing Algorithm: Theoretical and Experimental Results. In Bolc, L., editor, Natural Language Parsing Systems, 267–328, Springer, Berlin, 1987.

    Google Scholar 

  • Oeder, M. and H. Ney. Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP, volume 2, 119–122, 1993.

    Google Scholar 

  • Pereira, F. and Y. Schabes. Inside-outside reestimation from partially bracketed corpora. In Proceedings 30 th Annual Meeting of the ACL, Newark, Delaware, 1992.

  • Rabiner, L.R. and B.H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine: 4–15, 1986.

  • Ratnaparkhi, A. A linear observed time statistical parser based on maximum entropy models. In Proceedings of Empirical Methods in NLP, EMNLP-2, 1–10, 1997.

  • Resnik, P. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In Proceedings COLING'92, Nantes, 1992.

  • Rissanen, J. A universal prior for integers and estimation by minimum description length, The Annuals of Statistics, 11 (2): 416–431, 1983.

    Google Scholar 

  • Salomaa, A. Probabilistic and weighted grammars, Information and Control, 15: 529–544, 1969.

    Google Scholar 

  • Samuelsson, C. Fast Natural-Language Parsing Using Explanation-Based Learning, Swedish Institute of Computer Science Dissertation Series 13, Stockholm, Sweden, 1994.

  • Scha, R. Language theory and language technology; competence and performance. In de Kort, Q. and Leerdam, G., editors, Computertoepassingen in de Neerlandistiek, Almere: LVVN-jaarboek (can be obtained from http://www.hum.uva.nl/computerlinguistiek /scha/IAAA/rs/cv.html#Linguistics), 1990.

    Google Scholar 

  • Schabes, Y. Stochastic lexicalized tree-adjoining grammars. In Proceedings COLING'92, Nantes, 1992.

  • Schabes, Y. and R. Waters. Stochastic lexicalized context-free grammar. In Proceedings Third IWPT, Tilburg/Durbuy, 1993.

  • Sekine, S. and Grishman, R. A corpus-based probabilistic grammar with only two non-terminals. In Proceedings Fourth International Workshop on Parsing Technologies, Prague, 1995.

  • Sima'an, K. Learning Efficient Disambiguation. PhD dissertation. ILLC Dissertation Series 1999-02 (Utrecht University / University of Amsterdam), Amsterdam, 1999.

    Google Scholar 

  • Sima'an, K. Tree-gram parsing: lexical dependencies and structural relations. In Proceedings 38th Annual Meeting of the Association for Computational Linguistics (ACL'00), 53–60, Hong Kong, China, 2000.

  • Sima'an, K., A. Itai, Y. Winter, A. Altman and N. Nativ. Building a tree-bank of modern Hebrew texts. Traitement Automatique des Langues, Special Issue on Natural Language Processing and Corpus Linguistics (Béatrice Daille and Laurent Romary eds.), 42(2), 2001.

  • van Noord, G. The intersection of finite state automata and definite clause grammars. In Proceedings ACL-95, 1995.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sima'an, K. Computational Complexity of Probabilistic Disambiguation. Grammars 5, 125–151 (2002). https://doi.org/10.1023/A:1016340700671

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1016340700671