A Machine Learning Parser Using an Unlexicalized Distituent Model

Chan, Samuel W. K.; Cheung, Lawrence Y. L.; Chong, Mickey W. C.

doi:10.1007/978-3-642-12116-6_11

Samuel W. K. Chan¹⁷,
Lawrence Y. L. Cheung¹⁷ &
Mickey W. C. Chong¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1834 Accesses

Abstract

Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.: Parsing by Chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing. Kluwer Academic, Dordrecht (1991)
Google Scholar
Abney, S.: Partial Parsing via Finite-state Cascades. Natural Language Engineering 2, 337–344 (1996)
Article Google Scholar
Bikel, D.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD dissertation, University of Pennsylvania (2004)
Google Scholar
Brill, E., Magerman, D., Marcus, M., Santorini, B.: Deducing Linguistic Structure from the Statistics of Large Corpora. In: Proceedings of the Workshop on Speech and Natural Language, Human Language Technology Conference, pp. 275–282 (1990)
Google Scholar
Carreras, X.: Experiments with a Higher-order Projective Dependency Parser. In: Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 957–961 (2007)
Google Scholar
Charniak, E.: Statistical Techniques for Natural Language Parsing. AI Magazine 18(4), 33 (1997)
Google Scholar
Charniak, E.: A Maximum-Entropy-Inspired Parser. In: Proceedings of NAACL 2000, pp. 132–139 (2000)
Google Scholar
Chen, A.T., He, J.Z., Xu, L.J., Gey, F., Meggs, J.: Chinese Text Retrieval without Using a Dictionary. ACM SIGIR Forum 31, 42–49 (1997)
Article Google Scholar
Church, K.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143 (1988)
Google Scholar
Church, K., Gale, W., Hanks, P., Hindle, D.: Parsing, Word Associations and Typical Predicate-Argument Relations. In: Proceedings of the Workshop on Speech and Natural Language, Cape Cod, Massachusetts, October 15-18 (1989)
Google Scholar
Collins, M.: Three Generative, Lexicalised Models for Statistical Parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (and Eighth Conference of the European Chapter of the Association for Computational Linguistics), Madrid, pp. 16–23 (1997)
Google Scholar
Collins, M.: Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia (1999)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics 29(4), 589–637 (2003)
Article MathSciNet Google Scholar
Drábek, E., Zhou, Q.: Using Co-occurrence Statistics as an Information Source for Partial Parsing of Chinese. In: Proceedings of Second Chinese Language Processing Workshop, ACL 2000, Hong Kong, October 8, pp. 22–28 (2000)
Google Scholar
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar
Fung, P., Ngai, G., Yang, Y.S., Chen, B.F.: A Maximum-Entropy Chinese Parser Augmented by Transformation-Based Learning. ACM Transactions on Asian Language Information Processing 3(2), 159–168 (2004)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
MATH Google Scholar
Huang, L.Y.: Improve Chinese Parsing with Max-Ent Reranking Parser. Master project report (2009), http://sca2002.cs.brown.edu/research/pubs/theses/masters/2009/huang.pdf
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Google Scholar
Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool Publishers, San Francisco (2009)
Google Scholar
Li, C., Thompson, S.: Mandarin Chinese—A Functional Reference Grammar. University of California Press, Berkeley (1981)
Google Scholar
Magerman, D.: Natural Language Parsing as Statistical Pattern Recognition. PhD dissertation, Stanford University (1994)
Google Scholar
Magerman, D.: Statistical Decision-tree Models for Parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 276–283 (1995)
Google Scholar
Magerman, D., Marcus, M.: Parsing a Natural Language Using Mutual Information Statistics. In: Proceedings of AAAI 1990, 8th National Conference on AI, pp. 984–989 (1990)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, pp. 91–98 (2005)
Google Scholar
Nivre, J., Scholz, M.: Deterministic Dependency Parsing of English Text. In: Proceedings of COLING 2004, Geneva, Switzerland, August 23-27, pp. 64–70 (2004)
Google Scholar
Nivre, J., Hall, J., Kübler, S., Mcdonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 915–932 (2007)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text Chunking Using Transformation-based Learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)
Google Scholar
Ratnaparkhi, A.: Learning to Parse Natural Language with Maximum Entropy Models. Machine Learning 34, 151–175 (1999)
Article MATH Google Scholar
Sagae, K., Lavie, A.: A Classifier-Based Parser with Linear Run-Time Complexity. In: Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pp. 125–132 (2005)
Google Scholar
Sagae, K., Lavie, A.: A Best-First Probabilistic Shift-Reduce Parser. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, Morristown, NJ, USA, pp. 691–698. Association for Computational Linguistics (2006)
Google Scholar
Sang, E.: Transforming a Chunker to a Parser. In: Veenstra, J., Daelemans, W., Sima‘an, K., Zavrel, J. (eds.) Computational Linguistics in the Netherlands 2000, pp. 177–188 (2001)
Google Scholar
Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39, 135–168 (2000)
Article MATH Google Scholar
Sproat, R., Shih, C.L.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4(4), 336–351 (1990)
Google Scholar
Tsuruoka, Y., Tsujii, J.: Chunk Parsing Revisited. In: Proceedings of the 9th International Workshop on Parsing Technologies, pp. 133–140 (2005)
Google Scholar
Xue, N., Xia, F., Chiou, F., Palmer, M.: The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus. Natural Language Engineering 11(2), 207–238 (2005)
Article Google Scholar
Yamada, H., Matsumoto, Y.: Statistical Dependency Analysis with Support Vector Machines. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), Nancy, France, pp. 195–206 (2003)
Google Scholar
Zhou, Q.: Build a Large-Scale Syntactically Annotated Chinese Corpus. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 106–113. Springer, Heidelberg (2003)
Google Scholar
Zhou, Q.: Annotation Scheme for Chinese Treebank. Journal of Chinese Information Processing 18(4), 1–8 (2004) (in Chinese)
Google Scholar
Zhou, Q., Sun, M.: Build a Chinese Treebank as the Test Suite for Chinese Parsers. In: Proceedings of the Workshop MAL 1999 and NLPRS 1999, Beijing, China, pp. 32–36 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR
Samuel W. K. Chan, Lawrence Y. L. Cheung & Mickey W. C. Chong

Authors

Samuel W. K. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Y. L. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Mickey W. C. Chong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chan, S.W.K., Cheung, L.Y.L., Chong, M.W.C. (2010). A Machine Learning Parser Using an Unlexicalized Distituent Model. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics