Hostname: page-component-8448b6f56d-tj2md Total loading time: 0 Render date: 2024-04-19T15:48:43.042Z Has data issue: false hasContentIssue false

Structure-guided supertagger learning

Published online by Cambridge University Press:  14 March 2012

YAO-ZHONG ZHANG
Affiliation:
Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan email: yaozhong.zhang@is.s.u-tokyo.ac.jp, matuzaki@is.s.u-tokyo.ac.jp
TAKUYA MATSUZAKI
Affiliation:
Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan email: yaozhong.zhang@is.s.u-tokyo.ac.jp, matuzaki@is.s.u-tokyo.ac.jp
JUN'ICHI TSUJII
Affiliation:
Microsoft Research Asia, Haiian District, Beijing 100080, China e-mail: jtsujii@microsoft.com

Abstract

As described in this paper, we specifically examine the structural learning problem of a supertagging task. Supertagging is a task to assign the most probable lexical entry to each word in a sentence. A supertagger is extremely important for a lexicalized grammar parser because an accurate supertagger can greatly reduce lexical ambiguity in downstream parser. Supertagging is more challenging than conventional sequence labeling tasks (e.g., part-of-speech tagging). First, the supertags are numerous. Supertags are the lexical entries defined in a lexicalized grammar, which consists of rich syntactic/semantic information. Second, the inter-supertag relation is more complex. A proper supertag assignment is expected to be compatible with other supertag assignments in a sentence to construct a parse tree. Commonly used adjacent label features (e.g., first-order edge feature) in a sequence labeling model are too rough for the supertagging task. Long-range information is extremely important for the supertagging task. Two approaches to consider long-range information in a supertagger's training stage are proposed. Specifically, we propose a dependency-informed supertagger to use word-to-word dependency derived from a dependency parser and generate long-range features as soft constraints in the training. In the forest-guided supertagger, we constrain the classifier to learn in a grammar-satisfying space and use a CFG filter to impose grammar constraints for the update of model parameters. The experiments show that the proposed structure-guided supertaggers perform significantly better than the baseline supertaggers. Based on the improved supertaggers, the F-score of the final parser is also improved. Using the forest-guided supertagger in a shift-reduce HPSG parser, we achieved a competitive parsing performance of 89.31% F-score with higher parsing speed than that of a state-of-the-art HPSG parser.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bangalore, S., and Joshi, A. K. 1999. Supertagging: an approach to almost parsing. Computational Linguistics 25: 237–65.Google Scholar
Boullier, P. 2003. Supertagging: a non-statistical parsing-based approach. In Proceedings of IWPT-2003, Nancy, France, vol. 3, pp. 5565.Google Scholar
Carpenter, B. 1992. The logic of typed feature structures with applications to unification-based grammars, logic programming and constraint resolution. Cambridge Tracts in Theoretical Computer Science 32: 3650.Google Scholar
Charniak, E., Goldwater, S. and Johnson, M. 1998. Edge-based best-first chart parsing. In Proceedings of the 6th Workshop on Very Large Corpora, pp. 127–33.Google Scholar
Clark, S. 2002. Supertagging for combinatory categorial grammar. In Proceedings of the 6th International Workshop on Tree Adjoining Grammars and Related Frameworks, (TAG+6), Venice, Italy, pp. 1924.Google Scholar
Clark, S., and Curran, J. R. 2004. The importance of supertagging for wide-coverage CCG parsing. In Proceedings of COLING-04, Geneva, Switzerland, pp. 282–8.CrossRefGoogle Scholar
Cocke, J., and Schwartz, J. T. 1970. Programming Languages and Their Compilers; Preliminary Notes. New York, USA: Courant Institute of Mathematical Sciences, New York University.Google Scholar
Collins, M. 2002. Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of EMNLP-2002, Philadelphia, PA, USA, pp. 18.Google Scholar
Daumé, H. III, and Marcu, D. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of International Conference on Machine Learning (ICML), Bonn, Germany, pp. 169–76.CrossRefGoogle Scholar
Hall, K. B. 2005. Best-first Word-Lattice Parsing: Techniques for Integrated Syntactic Language Modeling. Rhode Island, USA: Brown University.Google Scholar
Hart, P. E., Nilsson, N. J. and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on 4 (2): 100–7.CrossRefGoogle Scholar
Joshi, A. K., and Srinivas, B. 1994. Disambiguation of super parts of speech (or supertags): Almost parsing. In Proceedings of the 15th Conference on Computational Linguistics, Kyoto, Japan, vol. 1, pp. 154–60. Stroudsburg PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Kasami, T. 1965. An efficient recognition and syntax analysis algorithm for context-free languages. Technical Report AFCRL-65-758, Air Force Cambridge Research Laboratory, Hanscom Air Force Base, MA, USA.Google Scholar
Kiefer, B., and Krieger, H.-U. 2000. A context-free approximation of head-driven phrase structure grammar. In Proceedings of IWPT-2000, Trento, Italy, pp. 135–46.Google Scholar
Liang, P., Daumé, H. III, and Klein, D. 2008. Structure compilation: trading structure for features. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp. 592–9. New York, USA: ACM.CrossRefGoogle Scholar
Matsuzaki, T., Miyao, Y., and Tsujii, J. 2007. Efficient HPSG parsing with supertagging and CFG-filtering. In Proceedings of IJCAI-07, Hyderabad, India, pp. 1671–6.Google Scholar
Miyao, Y. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, University of Michigan, MI, pp. 8390. Stroudsburg PA, USA: ACL.Google Scholar
Miyao, Y. 2006. From Linguistic Theory to Syntactic Analysis: Corpus-Oriented Grammar Development and Feature Forest Model. PhD Dissertation, The University of Tokyo, Japan.Google Scholar
Ninomiya, T., Matsuzaki, T., Miyao, Y., and Tsujii, J. 2007. A log-linear model with an n-gram reference distribution for accurate HPSG parsing. In Proceedings of IWPT-2007, Prague, Czech Republic.Google Scholar
Ninomiya, T., Tsuruoka, Y., Matsuzaki, T., and Miyao, Y. 2006. Extremely lexicalized models for accurate and fast HPSG parsing. In Proceedings of EMNLP-2006, Sydney, Australia, pp. 155–63. Stroudsburg PA, USA: ACL.Google Scholar
Pollard, C., and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. Chicago, IL/Stanford, CA, USA: University of Chicago/CSLI.Google Scholar
Punyakanok, V., Roth, D., Yih, W., and Zimak, D. 2005. Learning and inference over constrained output. In International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, vol. 19, p. 1124. University Park, PA, USA: Citeseer.Google Scholar
Sagae, K., Miyao, Y., and Tsujii, J. 2007. HPSG parsing with shallow dependency constraints. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, June, pp. 624–31.Google Scholar
Shen, L., and Joshi, A. K. 2003. A snow based supertagger with application to np chunking. In Proceedings of ACL-2003, Sapporo, Japan, pp. 505–12.Google Scholar
Shieber, S. M. 1985. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In Proceedings of the 23rd Annual Meeting on Association for Computational Linguistics, pp. 145–52. Stroudsburg PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Torisawa, K., Nishida, K., Miyao, Y., and Tsujii, J. I. 2000. An HPSG parser with CFG filtering. Natural Language Engineering 6 (1): 6380.CrossRefGoogle Scholar
Younger, D. H. 1967. Recognition and parsing of context-free languages in time n 3. Information and Control 10 (2): 189208.CrossRefGoogle Scholar