Abstract
Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abney, S.: Parsing by Chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing. Kluwer Academic, Dordrecht (1991)
Abney, S.: Partial Parsing via Finite-state Cascades. Natural Language Engineering 2, 337–344 (1996)
Bikel, D.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD dissertation, University of Pennsylvania (2004)
Brill, E., Magerman, D., Marcus, M., Santorini, B.: Deducing Linguistic Structure from the Statistics of Large Corpora. In: Proceedings of the Workshop on Speech and Natural Language, Human Language Technology Conference, pp. 275–282 (1990)
Carreras, X.: Experiments with a Higher-order Projective Dependency Parser. In: Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 957–961 (2007)
Charniak, E.: Statistical Techniques for Natural Language Parsing. AI Magazine 18(4), 33 (1997)
Charniak, E.: A Maximum-Entropy-Inspired Parser. In: Proceedings of NAACL 2000, pp. 132–139 (2000)
Chen, A.T., He, J.Z., Xu, L.J., Gey, F., Meggs, J.: Chinese Text Retrieval without Using a Dictionary. ACM SIGIR Forum 31, 42–49 (1997)
Church, K.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143 (1988)
Church, K., Gale, W., Hanks, P., Hindle, D.: Parsing, Word Associations and Typical Predicate-Argument Relations. In: Proceedings of the Workshop on Speech and Natural Language, Cape Cod, Massachusetts, October 15-18 (1989)
Collins, M.: Three Generative, Lexicalised Models for Statistical Parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (and Eighth Conference of the European Chapter of the Association for Computational Linguistics), Madrid, pp. 16–23 (1997)
Collins, M.: Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia (1999)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics 29(4), 589–637 (2003)
Drábek, E., Zhou, Q.: Using Co-occurrence Statistics as an Information Source for Partial Parsing of Chinese. In: Proceedings of Second Chinese Language Processing Workshop, ACL 2000, Hong Kong, October 8, pp. 22–28 (2000)
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Fung, P., Ngai, G., Yang, Y.S., Chen, B.F.: A Maximum-Entropy Chinese Parser Augmented by Transformation-Based Learning. ACM Transactions on Asian Language Information Processing 3(2), 159–168 (2004)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Huang, L.Y.: Improve Chinese Parsing with Max-Ent Reranking Parser. Master project report (2009), http://sca2002.cs.brown.edu/research/pubs/theses/masters/2009/huang.pdf
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool Publishers, San Francisco (2009)
Li, C., Thompson, S.: Mandarin Chinese—A Functional Reference Grammar. University of California Press, Berkeley (1981)
Magerman, D.: Natural Language Parsing as Statistical Pattern Recognition. PhD dissertation, Stanford University (1994)
Magerman, D.: Statistical Decision-tree Models for Parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 276–283 (1995)
Magerman, D., Marcus, M.: Parsing a Natural Language Using Mutual Information Statistics. In: Proceedings of AAAI 1990, 8th National Conference on AI, pp. 984–989 (1990)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, pp. 91–98 (2005)
Nivre, J., Scholz, M.: Deterministic Dependency Parsing of English Text. In: Proceedings of COLING 2004, Geneva, Switzerland, August 23-27, pp. 64–70 (2004)
Nivre, J., Hall, J., Kübler, S., Mcdonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 915–932 (2007)
Ramshaw, L.A., Marcus, M.P.: Text Chunking Using Transformation-based Learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)
Ratnaparkhi, A.: Learning to Parse Natural Language with Maximum Entropy Models. Machine Learning 34, 151–175 (1999)
Sagae, K., Lavie, A.: A Classifier-Based Parser with Linear Run-Time Complexity. In: Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pp. 125–132 (2005)
Sagae, K., Lavie, A.: A Best-First Probabilistic Shift-Reduce Parser. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, Morristown, NJ, USA, pp. 691–698. Association for Computational Linguistics (2006)
Sang, E.: Transforming a Chunker to a Parser. In: Veenstra, J., Daelemans, W., Sima‘an, K., Zavrel, J. (eds.) Computational Linguistics in the Netherlands 2000, pp. 177–188 (2001)
Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39, 135–168 (2000)
Sproat, R., Shih, C.L.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4(4), 336–351 (1990)
Tsuruoka, Y., Tsujii, J.: Chunk Parsing Revisited. In: Proceedings of the 9th International Workshop on Parsing Technologies, pp. 133–140 (2005)
Xue, N., Xia, F., Chiou, F., Palmer, M.: The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus. Natural Language Engineering 11(2), 207–238 (2005)
Yamada, H., Matsumoto, Y.: Statistical Dependency Analysis with Support Vector Machines. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), Nancy, France, pp. 195–206 (2003)
Zhou, Q.: Build a Large-Scale Syntactically Annotated Chinese Corpus. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 106–113. Springer, Heidelberg (2003)
Zhou, Q.: Annotation Scheme for Chinese Treebank. Journal of Chinese Information Processing 18(4), 1–8 (2004) (in Chinese)
Zhou, Q., Sun, M.: Build a Chinese Treebank as the Test Suite for Chinese Parsers. In: Proceedings of the Workshop MAL 1999 and NLPRS 1999, Beijing, China, pp. 32–36 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chan, S.W.K., Cheung, L.Y.L., Chong, M.W.C. (2010). A Machine Learning Parser Using an Unlexicalized Distituent Model. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)