Abstract
Many state-of-the-art statistical parsers for English can be viewed as Probabilistic Context-Free Grammars (PCFGs) acquired from treebanks consisting of phrase-structure trees enriched with a variety of contextual, derivational (e.g., markovization) and lexical information. In this paper we empirically investigate the applicability and adequacy of the unlexicalized variety of such parsing models to Modern Hebrew, a Semitic language that differs in structure and characteristics from English. We show that contrary to experience with parsing the WSJ, the markovized, head-driven unlexicalized variety does not necessarily outperform plain PCFGs for Semitic languages. We demonstrate that enriching unlexicalized PCFGs with morphologically marked agreement features percolated up the parse tree (e.g., definiteness) outperforms plain PCFGs as well as a simple head-driven variation on the MH treebank. We further show that an (unlexicalized) head-driven variety enriched with the same features achieves even better performance. We conclude that morphologically rich languages introduce an additional dimension of parametrization that is orthogonal to the horizontal/vertical dimensions proposed before [1] and its contribution is essential and complementary.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, pp. 423–430. Springer, Heidelberg (2004)
Sima’an, K., Itai, A., Winter, Y., Altman, A., Nativ, N.: Building a Tree-Bank of Modern Hebrew Text. In: Traitement Automatique des Langues (2001)
Tsarfaty, R.: Integrated Morphological and Syntactic Disambiguation for Modern Hebrew. In: Proceeding of SRW COLING-ACL (2006)
Bikel, D.: Intricacies of Collins’ Parsing Model. Computational Linguistics 30(4) (2004)
Charniak, E.: Tree-Bank Grammars. In: AAAI/IAAI, vol. 2, pp. 1031–1036 (1996)
Johnson, M.: PCFG Models of Linguistic Tree Representations. Computational Linguistics 24(4), 613–632 (1998)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics (2003)
Dubey, A., Keller, F.: Probabilistic Parsing for German using Sister-Head Dependencies. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, Springer, Heidelberg (2004)
Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A Statistical Parser for Czech. In: Proceedings of ACL, College Park, Maryland (1999)
Bikel, D., Chiang, D.: Two Statistical Parsing Models Applied to the Chinese Treebank. In: Second Chinese Language Processing Workshop, Hong Kong (2000)
Wintner, S.: Definiteness in the Hebrew Noun Phrase. Journal of Linguistics 36, 319–363 (2000)
Goldberg, Y., Adler, M., Elhadad, M.: Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features. In: Proceedings of COLING-ACL (2006)
Danon, G.: Syntactic Definiteness in the Grammar of Modern Hebrew. Linguistics 39(6), 1071–1116 (2001)
Marcus, M., Kim, G., Marcinkiewicz, M., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate-Argument Structure (1994)
Milea, A.: Treebank Annotation Guide. MILA, Knowledge Center for Hebrew Processing (2007)
Hageloh, F.: Parsing using Transforms over Treebanks. Master’s thesis, University of Amsterdam (2007)
Schmid, H.: Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors. In: Proceedings of ACL (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsarfaty, R., Sima’an, K. (2007). Accurate Unlexicalized Parsing for Modern Hebrew. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)