Abstract
The Data Oriented Parsing (DOP) model currently achieves state-of-the-art parsing on benchmark corpora. However, existing DOP parameter estimation methods are known to be biased, and ad hoc adjustments are needed in order to reduce the effects of these biases on performance. This paper presents a novel estimation procedure that exploits a unique property of DOP: different derivations can generate the same parse-tree. We show that the different derivations represent different “Markov orders” that the DOP model interpolates together. The idea behind the present method is to combine the different derivation orders by backoff instead of interpolation. This allows for a novel estimation procedure that employs Katz backoff for estimation. We report on experiments showing error reduction of up to 15% with respect to earlier methods.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bod, R.: What is the minimal set of fragments that achieves maximal parse accuracy? In: Proceedings of the 39th Annual Meeting of the ACL (ACL 2001) (2001)
Bod, R.: Enriching Linguistics with Statistics: Performance models of Natural Language. PhD dissertation. ILLC dissertation series 1995-14, University of Amsterdam (1995)
Chelba, C., Jelinek, F.: Exploiting syntactic structure for language modeling. In: Boitet, C., Whitelock, P. (eds.) Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pp. 225–231. Morgan Kaufmann Publishers, San Francisco (1998)
Charniak, E.: A maximum entropy inspired parser. In: Proceedings of the 1st Meeting of the North American Chapter of the ACL (NAACL 2000), Seattle, Washington, USA, pp. 132–139 (2000)
Black, E., Jelinek, F., Lafferty, J., Magerman, D., Mercer, R., Roukos, S.: Towards History based Grammars: Using Richer Models for Probabilistic Parsing. In: Proceedings of the 31st Annual Meeting of the ACL (ACL 1993), Columbus, Ohio (1993)
Sima’an, K.: Computational complexity of probabilistic disambiguation. Grammars 5(2), 125–151 (2002)
Bonnema, R., Buying, P., Scha, R.: A new probability model for data oriented parsing. In: Dekker, P. (ed.) Proceedings of the Twelfth Amsterdam Colloquium, pp. 85–90. University of Amsterdam, Amsterdam (1999)
Johnson, M.: The DOP estimation method is biased and inconsistent. Computational Linguistics 28(1), 71–76 (2002)
Buratto, L.: Back-off as parameter estimation for DOP models. In: de Jongh, D. (ed.) Institute for Logic, Language and Computation (ILLC). Master of Logic Series (MoL-2002-07). ILLC Scientific Publications, Amsterdam (2002)
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing (ASSP) 35(3), 400–401 (1987)
Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University (1998)
Good, I.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)
Veldhuijzen van Zanten, G.: Semantics of update expressions. Technical report #24, Netherlands Organization for Scientific Research (NWO), Priority Programme for Speech and Language Technology (1996), http://grid.let.rug.nl:4321/
Black, E., et al.: A procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In: Proceedings of the February 1991 DARPA Speech and Natural Language Workshop, pp. 306–311. Morgan Kaufman, San Mateo (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buratto, L., Sima’an, K. (2003). Backoff DOP: Parameter Estimation by Backoff. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive