Abstract
We build a class-based selection preference sub-model to incorporate external semantic knowledge from two Chinese electronic semantic dictionaries. This sub-model is combined with modifier-head generation sub-model. After being optimized on the held out data by the EM algorithm, our improved parser achieves 79.4% (F1 measure), as well as a 4.4% relative decrease in error rate on the Penn Chinese Treebank (CTB). Further analysis of performance improvement indicates that semantic knowledge is helpful for nominal compounds, coordination, and N⋄V tagging disambiguation, as well as alleviating the sparseness of information available in treebank.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania (1999)
Resnik, P.S.: Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA (1993)
Harabagiu, S.: An Application of WordNet to Prepositional Attachement. In: Proceedings of ACL-1996, Santa Cruz CA, June 1996, pp. 360–363 (1996)
Krymolowski, Y., Roth, D.: Incorporating Knowledge in Natural Language Learning: A Case Study. In: COLING-ACL 1998 Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
McLauchlan, M.: Thesauruses for Prepositional Phrase Attachment. In: Proceedings of CoNLL-2004, Boston, MA, USA, pp. 73–80 (2004)
Xia, F.: Automatic Grammar Generation from Two Different Perspectives. PhD thesis, University of Pennsylvania (1999)
Klein, D., Manning, C.D.: Fast Exact Natural Language Parsing with a Factored Model. Advances in Neural Information Processing Systems 15 (NIPS-2002) (2002)
Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of ACL-2003 (2003)
Gildea, D.: Corpus variation and parser performance. In: Proceedings of EMNLP-2001, Pittsburgh, Pennsylvania (2001)
Bikel, D.M.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD thesis, University of Pennsylvania (2004a)
Xue, N., Xia, F.: The Bracketing Guidelines for Chinese Treebank Project. Technical Report IRCS 00-08, University of Pennsylvania (2000)
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese Treebank? In: Proceedings of ACL-2003 (2003)
Xiong, D., Liu, Q., Lin, S.: Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates. In: Proceedings of the 6th Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Mexico City, Mexico (2005)
Bikel, D.M., Chiang, D.: Two statistical parsing models applied to the chinese treebank. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 1–6 (2000)
Bikel, D.M.: Intricacies of Collins’ Parsing Model. to appear in Computational Linguistics (2004b)
Chen, K., Hong, W.: Resolving Ambiguities of Predicate-object and Modifier-noun Structures for Chinese V-N Patterns. Communication of COLIPS 6(2), 73–79 (1996) (in Chinese)
Chiang, D., Bikel, D.M.: Recovering Latent Information in Treebanks. In: Proceedings of COLING 2002 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xiong, D., Li, S., Liu, Q., Lin, S., Qian, Y. (2005). Parsing the Penn Chinese Treebank with Semantic Knowledge. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_7
Download citation
DOI: https://doi.org/10.1007/11562214_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)