Structure-guided supertagger learning

YAO-ZHONG ZHANG; TAKUYA MATSUZAKI; JUN'ICHI TSUJII

doi:10.1017/S1351324912000034

Structure-guided supertagger learning

Published online by Cambridge University Press: 14 March 2012

YAO-ZHONG ZHANG ,

TAKUYA MATSUZAKI and

JUN'ICHI TSUJII

Show author details

YAO-ZHONG ZHANG: Affiliation:
Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan email: yaozhong.zhang@is.s.u-tokyo.ac.jp, matuzaki@is.s.u-tokyo.ac.jp
TAKUYA MATSUZAKI: Affiliation:
Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan email: yaozhong.zhang@is.s.u-tokyo.ac.jp, matuzaki@is.s.u-tokyo.ac.jp
JUN'ICHI TSUJII: Affiliation:
Microsoft Research Asia, Haiian District, Beijing 100080, China e-mail: jtsujii@microsoft.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

As described in this paper, we specifically examine the structural learning problem of a supertagging task. Supertagging is a task to assign the most probable lexical entry to each word in a sentence. A supertagger is extremely important for a lexicalized grammar parser because an accurate supertagger can greatly reduce lexical ambiguity in downstream parser. Supertagging is more challenging than conventional sequence labeling tasks (e.g., part-of-speech tagging). First, the supertags are numerous. Supertags are the lexical entries defined in a lexicalized grammar, which consists of rich syntactic/semantic information. Second, the inter-supertag relation is more complex. A proper supertag assignment is expected to be compatible with other supertag assignments in a sentence to construct a parse tree. Commonly used adjacent label features (e.g., first-order edge feature) in a sequence labeling model are too rough for the supertagging task. Long-range information is extremely important for the supertagging task. Two approaches to consider long-range information in a supertagger's training stage are proposed. Specifically, we propose a dependency-informed supertagger to use word-to-word dependency derived from a dependency parser and generate long-range features as soft constraints in the training. In the forest-guided supertagger, we constrain the classifier to learn in a grammar-satisfying space and use a CFG filter to impose grammar constraints for the update of model parameters. The experiments show that the proposed structure-guided supertaggers perform significantly better than the baseline supertaggers. Based on the improved supertaggers, the F-score of the final parser is also improved. Using the forest-guided supertagger in a shift-reduce HPSG parser, we achieved a competitive parsing performance of 89.31% F-score with higher parsing speed than that of a state-of-the-art HPSG parser.

Type: Articles
Information: Natural Language Engineering , Volume 18 , Special Issue 2: Statistical Learning of Natural Language Structured Input and Output , April 2012 , pp. 205 - 234

DOI: https://doi.org/10.1017/S1351324912000034 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bangalore, S., and Joshi, A. K. 1999. Supertagging: an approach to almost parsing. Computational Linguistics 25: 237–65.Google Scholar

Boullier, P. 2003. Supertagging: a non-statistical parsing-based approach. In Proceedings of IWPT-2003, Nancy, France, vol. 3, pp. 55–65.Google Scholar

Carpenter, B. 1992. The logic of typed feature structures with applications to unification-based grammars, logic programming and constraint resolution. Cambridge Tracts in Theoretical Computer Science 32: 36–50.Google Scholar

Charniak, E., Goldwater, S. and Johnson, M. 1998. Edge-based best-first chart parsing. In Proceedings of the 6th Workshop on Very Large Corpora, pp. 127–33.Google Scholar

Clark, S. 2002. Supertagging for combinatory categorial grammar. In Proceedings of the 6th International Workshop on Tree Adjoining Grammars and Related Frameworks, (TAG+6), Venice, Italy, pp. 19–24.Google Scholar

Clark, S., and Curran, J. R. 2004. The importance of supertagging for wide-coverage CCG parsing. In Proceedings of COLING-04, Geneva, Switzerland, pp. 282–8.CrossRef Google Scholar

Cocke, J., and Schwartz, J. T. 1970. Programming Languages and Their Compilers; Preliminary Notes. New York, USA: Courant Institute of Mathematical Sciences, New York University.Google Scholar

Collins, M. 2002. Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of EMNLP-2002, Philadelphia, PA, USA, pp. 1–8.Google Scholar

Daumé, H. III, and Marcu, D. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of International Conference on Machine Learning (ICML), Bonn, Germany, pp. 169–76.CrossRef Google Scholar

Hall, K. B. 2005. Best-first Word-Lattice Parsing: Techniques for Integrated Syntactic Language Modeling. Rhode Island, USA: Brown University.Google Scholar

Hart, P. E., Nilsson, N. J. and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on 4 (2): 100–7.CrossRef Google Scholar

Joshi, A. K., and Srinivas, B. 1994. Disambiguation of super parts of speech (or supertags): Almost parsing. In Proceedings of the 15th Conference on Computational Linguistics, Kyoto, Japan, vol. 1, pp. 154–60. Stroudsburg PA, USA: Association for Computational Linguistics.CrossRef Google Scholar

Kasami, T. 1965. An efficient recognition and syntax analysis algorithm for context-free languages. Technical Report AFCRL-65-758, Air Force Cambridge Research Laboratory, Hanscom Air Force Base, MA, USA.Google Scholar

Kiefer, B., and Krieger, H.-U. 2000. A context-free approximation of head-driven phrase structure grammar. In Proceedings of IWPT-2000, Trento, Italy, pp. 135–46.Google Scholar

Liang, P., Daumé, H. III, and Klein, D. 2008. Structure compilation: trading structure for features. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp. 592–9. New York, USA: ACM.CrossRef Google Scholar

Matsuzaki, T., Miyao, Y., and Tsujii, J. 2007. Efficient HPSG parsing with supertagging and CFG-filtering. In Proceedings of IJCAI-07, Hyderabad, India, pp. 1671–6.Google Scholar

Miyao, Y. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, University of Michigan, MI, pp. 83–90. Stroudsburg PA, USA: ACL.Google Scholar

Miyao, Y. 2006. From Linguistic Theory to Syntactic Analysis: Corpus-Oriented Grammar Development and Feature Forest Model. PhD Dissertation, The University of Tokyo, Japan.Google Scholar

Ninomiya, T., Matsuzaki, T., Miyao, Y., and Tsujii, J. 2007. A log-linear model with an n-gram reference distribution for accurate HPSG parsing. In Proceedings of IWPT-2007, Prague, Czech Republic.Google Scholar

Ninomiya, T., Tsuruoka, Y., Matsuzaki, T., and Miyao, Y. 2006. Extremely lexicalized models for accurate and fast HPSG parsing. In Proceedings of EMNLP-2006, Sydney, Australia, pp. 155–63. Stroudsburg PA, USA: ACL.Google Scholar

Pollard, C., and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. Chicago, IL/Stanford, CA, USA: University of Chicago/CSLI.Google Scholar

Punyakanok, V., Roth, D., Yih, W., and Zimak, D. 2005. Learning and inference over constrained output. In International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, vol. 19, p. 1124. University Park, PA, USA: Citeseer.Google Scholar

Sagae, K., Miyao, Y., and Tsujii, J. 2007. HPSG parsing with shallow dependency constraints. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, June, pp. 624–31.Google Scholar

Shen, L., and Joshi, A. K. 2003. A snow based supertagger with application to np chunking. In Proceedings of ACL-2003, Sapporo, Japan, pp. 505–12.Google Scholar

Shieber, S. M. 1985. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In Proceedings of the 23rd Annual Meeting on Association for Computational Linguistics, pp. 145–52. Stroudsburg PA, USA: Association for Computational Linguistics.CrossRef Google Scholar

Torisawa, K., Nishida, K., Miyao, Y., and Tsujii, J. I. 2000. An HPSG parser with CFG filtering. Natural Language Engineering 6 (1): 63–80.CrossRef Google Scholar

Younger, D. H. 1967. Recognition and parsing of context-free languages in time n ³. Information and Control 10 (2): 189–208.CrossRef Google Scholar

Article contents

Structure-guided supertagger learning

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests