Abstract
We build a class-based selection preference sub-model to incorporate external semantic knowledge from two Chinese electronic semantic dictionaries. This sub-model is combined with modifier-head generation sub-model. After being optimized on the held out data by the EM algorithm, our improved parser achieves 79.4% (F1 measure), as well as a 4.4% relative decrease in error rate on the Penn Chinese Treebank (CTB). Further analysis of performance improvement indicates that semantic knowledge is helpful for nominal compounds, coordination, and N⋄V tagging disambiguation, as well as alleviating the sparseness of information available in treebank.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania (1999)
Resnik, P.S.: Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA (1993)
Harabagiu, S.: An Application of WordNet to Prepositional Attachement. In: Proceedings of ACL-1996, Santa Cruz CA, June 1996, pp. 360–363 (1996)
Krymolowski, Y., Roth, D.: Incorporating Knowledge in Natural Language Learning: A Case Study. In: COLING-ACL 1998 Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
McLauchlan, M.: Thesauruses for Prepositional Phrase Attachment. In: Proceedings of CoNLL-2004, Boston, MA, USA, pp. 73–80 (2004)
Xia, F.: Automatic Grammar Generation from Two Different Perspectives. PhD thesis, University of Pennsylvania (1999)
Klein, D., Manning, C.D.: Fast Exact Natural Language Parsing with a Factored Model. Advances in Neural Information Processing Systems 15 (NIPS-2002) (2002)
Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of ACL-2003 (2003)
Gildea, D.: Corpus variation and parser performance. In: Proceedings of EMNLP-2001, Pittsburgh, Pennsylvania (2001)
Bikel, D.M.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD thesis, University of Pennsylvania (2004a)
Xue, N., Xia, F.: The Bracketing Guidelines for Chinese Treebank Project. Technical Report IRCS 00-08, University of Pennsylvania (2000)
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese Treebank? In: Proceedings of ACL-2003 (2003)
Xiong, D., Liu, Q., Lin, S.: Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates. In: Proceedings of the 6th Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Mexico City, Mexico (2005)
Bikel, D.M., Chiang, D.: Two statistical parsing models applied to the chinese treebank. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 1–6 (2000)
Bikel, D.M.: Intricacies of Collins’ Parsing Model. to appear in Computational Linguistics (2004b)
Chen, K., Hong, W.: Resolving Ambiguities of Predicate-object and Modifier-noun Structures for Chinese V-N Patterns. Communication of COLIPS 6(2), 73–79 (1996) (in Chinese)
Chiang, D., Bikel, D.M.: Recovering Latent Information in Treebanks. In: Proceedings of COLING 2002 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xiong, D., Li, S., Liu, Q., Lin, S., Qian, Y. (2005). Parsing the Penn Chinese Treebank with Semantic Knowledge. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_7
Download citation
DOI: https://doi.org/10.1007/11562214_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)