Skip to main content

A Machine Learning Parser Using an Unlexicalized Distituent Model

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2010)

Abstract

Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abney, S.: Parsing by Chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing. Kluwer Academic, Dordrecht (1991)

    Google Scholar 

  • Abney, S.: Partial Parsing via Finite-state Cascades. Natural Language Engineering 2, 337–344 (1996)

    Article  Google Scholar 

  • Bikel, D.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD dissertation, University of Pennsylvania (2004)

    Google Scholar 

  • Brill, E., Magerman, D., Marcus, M., Santorini, B.: Deducing Linguistic Structure from the Statistics of Large Corpora. In: Proceedings of the Workshop on Speech and Natural Language, Human Language Technology Conference, pp. 275–282 (1990)

    Google Scholar 

  • Carreras, X.: Experiments with a Higher-order Projective Dependency Parser. In: Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 957–961 (2007)

    Google Scholar 

  • Charniak, E.: Statistical Techniques for Natural Language Parsing. AI Magazine 18(4), 33 (1997)

    Google Scholar 

  • Charniak, E.: A Maximum-Entropy-Inspired Parser. In: Proceedings of NAACL 2000, pp. 132–139 (2000)

    Google Scholar 

  • Chen, A.T., He, J.Z., Xu, L.J., Gey, F., Meggs, J.: Chinese Text Retrieval without Using a Dictionary. ACM SIGIR Forum 31, 42–49 (1997)

    Article  Google Scholar 

  • Church, K.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143 (1988)

    Google Scholar 

  • Church, K., Gale, W., Hanks, P., Hindle, D.: Parsing, Word Associations and Typical Predicate-Argument Relations. In: Proceedings of the Workshop on Speech and Natural Language, Cape Cod, Massachusetts, October 15-18 (1989)

    Google Scholar 

  • Collins, M.: Three Generative, Lexicalised Models for Statistical Parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (and Eighth Conference of the European Chapter of the Association for Computational Linguistics), Madrid, pp. 16–23 (1997)

    Google Scholar 

  • Collins, M.: Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia (1999)

    Google Scholar 

  • Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics 29(4), 589–637 (2003)

    Article  MathSciNet  Google Scholar 

  • Drábek, E., Zhou, Q.: Using Co-occurrence Statistics as an Information Source for Partial Parsing of Chinese. In: Proceedings of Second Chinese Language Processing Workshop, ACL 2000, Hong Kong, October 8, pp. 22–28 (2000)

    Google Scholar 

  • Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  • Fung, P., Ngai, G., Yang, Y.S., Chen, B.F.: A Maximum-Entropy Chinese Parser Augmented by Transformation-Based Learning. ACM Transactions on Asian Language Information Processing 3(2), 159–168 (2004)

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  • Huang, L.Y.: Improve Chinese Parsing with Max-Ent Reranking Parser. Master project report (2009), http://sca2002.cs.brown.edu/research/pubs/theses/masters/2009/huang.pdf

  • Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)

    Google Scholar 

  • Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool Publishers, San Francisco (2009)

    Google Scholar 

  • Li, C., Thompson, S.: Mandarin Chinese—A Functional Reference Grammar. University of California Press, Berkeley (1981)

    Google Scholar 

  • Magerman, D.: Natural Language Parsing as Statistical Pattern Recognition. PhD dissertation, Stanford University (1994)

    Google Scholar 

  • Magerman, D.: Statistical Decision-tree Models for Parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 276–283 (1995)

    Google Scholar 

  • Magerman, D., Marcus, M.: Parsing a Natural Language Using Mutual Information Statistics. In: Proceedings of AAAI 1990, 8th National Conference on AI, pp. 984–989 (1990)

    Google Scholar 

  • Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  • McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, pp. 91–98 (2005)

    Google Scholar 

  • Nivre, J., Scholz, M.: Deterministic Dependency Parsing of English Text. In: Proceedings of COLING 2004, Geneva, Switzerland, August 23-27, pp. 64–70 (2004)

    Google Scholar 

  • Nivre, J., Hall, J., Kübler, S., Mcdonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 915–932 (2007)

    Google Scholar 

  • Ramshaw, L.A., Marcus, M.P.: Text Chunking Using Transformation-based Learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)

    Google Scholar 

  • Ratnaparkhi, A.: Learning to Parse Natural Language with Maximum Entropy Models. Machine Learning 34, 151–175 (1999)

    Article  MATH  Google Scholar 

  • Sagae, K., Lavie, A.: A Classifier-Based Parser with Linear Run-Time Complexity. In: Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pp. 125–132 (2005)

    Google Scholar 

  • Sagae, K., Lavie, A.: A Best-First Probabilistic Shift-Reduce Parser. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, Morristown, NJ, USA, pp. 691–698. Association for Computational Linguistics (2006)

    Google Scholar 

  • Sang, E.: Transforming a Chunker to a Parser. In: Veenstra, J., Daelemans, W., Sima‘an, K., Zavrel, J. (eds.) Computational Linguistics in the Netherlands 2000, pp. 177–188 (2001)

    Google Scholar 

  • Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39, 135–168 (2000)

    Article  MATH  Google Scholar 

  • Sproat, R., Shih, C.L.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4(4), 336–351 (1990)

    Google Scholar 

  • Tsuruoka, Y., Tsujii, J.: Chunk Parsing Revisited. In: Proceedings of the 9th International Workshop on Parsing Technologies, pp. 133–140 (2005)

    Google Scholar 

  • Xue, N., Xia, F., Chiou, F., Palmer, M.: The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus. Natural Language Engineering 11(2), 207–238 (2005)

    Article  Google Scholar 

  • Yamada, H., Matsumoto, Y.: Statistical Dependency Analysis with Support Vector Machines. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), Nancy, France, pp. 195–206 (2003)

    Google Scholar 

  • Zhou, Q.: Build a Large-Scale Syntactically Annotated Chinese Corpus. In: MatouÅ¡ek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 106–113. Springer, Heidelberg (2003)

    Google Scholar 

  • Zhou, Q.: Annotation Scheme for Chinese Treebank. Journal of Chinese Information Processing 18(4), 1–8 (2004) (in Chinese)

    Google Scholar 

  • Zhou, Q., Sun, M.: Build a Chinese Treebank as the Test Suite for Chinese Parsers. In: Proceedings of the Workshop MAL 1999 and NLPRS 1999, Beijing, China, pp. 32–36 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chan, S.W.K., Cheung, L.Y.L., Chong, M.W.C. (2010). A Machine Learning Parser Using an Unlexicalized Distituent Model. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12116-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12115-9

  • Online ISBN: 978-3-642-12116-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics