Abstract
Technology for natural language analysis using linguistically precise grammars has matured to a level of coverage and efficiency that enables parsing of large amounts of running text. Research groups working within grammatical frameworks like Combinatory Categorial Grammar (CCG; Clark and Curran, 2004), Lexical-Functional Grammar (LFG; Riezler et al., 2002), and Head-Driven Phrase Structure Grammar (HPSG; Malouf and van Noord, 2004; Oepen et al., 2004; Miyao et al., 2005) have successfully integrated broad-coverage computational grammars with sophisticated statistical parse selection models. The former delineate the space of possible analyses, while the latter provide a probability distribution over competing hypotheses. Parse selection approaches for these frameworks often use discriminative Maximum Entropy (ME) models, where the probability of each parse tree, given an input string, is estimated on the basis of select properties (called features) of the tree (Abney, 1997; Johnson et al., 1999). Such features, in principle, are not restricted in their domain of locality, and enable the parse selection process to take into account properties that extend beyond local contexts (i.e. sub-trees of depth one).
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Deep Linguistic Processing with HPSG, an open-source repository of grammars and processing tools; see “http://www.delph-in.net/”.
- 2.
This property of parse forests is not a prerequisite of the chart parsing framework.
- 3.
Our graphical representation of the forest closely resembles the data structures actually used during parsing. Sets of packed edges (indicated by ovals) correspond to “or” (or disjunctive) nodes, when viewing the forest as a general and – or graph; in this view, the edges themselves (drawn as boxes in Fig. 13.2 correspond to “and” (or conjunctive) nodes. Conversely, “or” nodes are represented as vertexes in the conceptualization of parse forests as hypergraphs (Klein and Manning, 2001; Huang and Chiang, 2005), where hyperarcs correspond to “and” nodes.
- 4.
The treebank comprises several booklets of edited, instructional texts on backcountry activities in Norway. The data is available from the LOGON web site at “http://www.emmtee.net”.
- 5.
The data in this treebank is taken from transcribed appointment scheduling dialogues; see “http://gg.dfki.de/” for further information on GG and its treebank.
- 6.
The models were trained using the open-source TADM package (Malouf, 2002), using default hyper-parameters for all configurations, viz. a convergence threshold of \(10^{-8}\), variance of the prior of \(10^{-4}\), and frequency cut-off of 5. It is likely that further optimization of hyper-parameters for individual configurations would moderately improve model performance, especially for higher-order grandparenting levels with large numbers of features.
References
Abney, S.P. (1997). Stochastic attribute-value grammars. Computational Linguistics 23, 597–618.
Billot, S. and B. Lang (1989). The structure of shared forests in ambiguous parsing. In Proceedings of the 27th Meeting of the Association for Computational Linguistics, Vancouver, BC, pp. 143–151.
Callmeier, U. (2002). Preprocessing and encoding techniques in PET. In S. Oepen, D. Flickinger, J. Tsujii, and H. Uszkoreit (Eds.), Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing. Stanford, CA: CSLI Publications.
Caraballo, S.A. and E. Charniak (1998). New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24(2), 275–298.
Carroll, J. and S. Oepen (2005). High-efficiency realization for a wide-coverage unification grammar. In R. Dale and K.F. Wong (Eds.), Proceedings of the 2nd International Joint Conference on Natural Language Processing, Lecture Notes in Artificial Intelligence, vol. 3651. Jeju, Korea: Springer, pp. 165–176.
Clark, S. and J.R. Curran (2004). Parsing the WSJ using CCG and log-linear models. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics, Barcelona, pp. 104–111.
Clark, S. and J.R. Curran (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics 33(4), 493–552.
Copestake, A. (2002). Implementing Typed Feature Structure Grammars. Stanford, CA: CSLI Publications.
Crysmann, B. (2005). Relative clause extraposition in German. An efficient and portable implementation. Research on Language and Computation 3(1), 61–82.
Erbach, G. (1991). A flexible parser for a linguistic development environment. In O. Herzog and C.R. Rollinger (Eds.), Text Understanding in LILOG. Berlin: Springer, pp. 74–87.
Flickinger, D. (2000). On building a more efficient grammar by exploiting types. Natural Language Engineering 6(1), 15–28.
Geman, S. and M. Johnson (2002). Dynamic programming for parsing and estimation of stochastic unification-based grammars. In Proceedings of the 40th Meeting of the Association for Computational Linguistics, Philadelphia, PA.
Huang, L. (2008). Forest reranking: discriminative parsing with non-local features. In Proceedings of the ACL-08: HLT, Columbus, OH.
Huang, L. and D. Chiang (2005). Better k-best parsing. In Proceedings of the 9th International Workshop on Parsing Technologies, Vancouver, BC, pp. 53–64.
Jiménez, V.M. and A. Marzal (2000). Computation of the n best parse trees for weighted and stochastic context-free grammars. In Proceedings of the Joint International Workshops on Advances in Pattern Recognition. London: Springer, pp. 183–192.
Johnson, M., S. Geman, S. Canon, Z. Chi, and S. Riezler (1999). Estimators for stochastic ‘unificationbased’ grammars. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, pp. 535–541.
Kasami, T. (1965). An efficient recognition and syntax algorithm for context-free languages. Technical Report 65-758, Air Force Cambrige Research Laboratory, Bedford, MA.
Klein, D. and C.D. Manning (2001). Parsing and hypergraphs. In Proceedings of the 7th International Workshop on Parsing Technologies, Beijing, pp. 123–134.
Klein, D. and C.D. Manning (2003). A\(^{\ast}\) parsing. Fast exact Viterbi parse selection. In Proceedings of the 4th Conference of the North American Chapter of the ACL, Edmonton.
Lang, B. (1994). Recognition can be harder than parsing. Computational Intelligence 10(4), 486–494.
Langkilde, I. (2000). Forest-based statistical sentence generation. In Proceedings of the 1st Conference of the North American Chapter of the ACL, Seattle, WA.
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th Conference on Natural Language Learning, Taipei.
Malouf, R. and G. van Noord (2004). Wide coverage parsing with stochastic attribute value grammars. In Proceedings of the IJCNLP Workshop Beyond Shallow Analysis, Hainan.
Miyao, Y., T. Ninomiya, and J. Tsujii (2005). Corpus-oriented grammar development for acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank. In K.Y. Su, J. Tsujii, J.H. Lee, and O.Y. Kwong (Eds.), Natural Language Processing, Hainan Island, Lecture Notes in Artificial Intelligence, vol. 3248. Berlin: Springer, pp. 684–693.
Miyao, Y. and J. Tsujii (2008). Feature forest models for probabilistic HPSG parsing. Computational Linguistics 34(1), 35–88.
Moore, R.C. and H. Alshawi (1992). Syntactic and semantic processing. In H. Alshawi (Ed.), The Core Language Engine. Cambridge, MA: MIT Press, pp. 129–148.
Müller, S. and W. Kasper (2000). HPSG analysis of German. In W. Wahlster (Ed.), Verbmobil. Foundations of Speech-to-Speech Translation (Artificial Intelligence ed.). Berlin: Springer, pp. 238–253.
Oepen, S. and J. Carroll (2002). Efficient parsing for unification-based grammars. In S. Oepen, D. Flickinger, J. Tsujii, and H. Uszkoreit (Eds.), Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing. Stanford, CA: CSLI Publications, pp. 195–225.
Oepen, S., D. Flickinger, K. Toutanova, and C.D. Manning (2004). LinGO Redwoods. A rich and dynamic treebank for HPSG. Journal of Research on Language and Computation 2(4), 575–596.
Riezler, S., T.H. King, R.M. Kaplan, R. Crouch, J.T. Maxwell III, and M. Johnson (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques. In Proceedings of the 40th Meeting of the Association for Computational Linguistics, Philadelphia, PA.
Toutanova, K., C.D. Manning, D. Flickinger, and S. Oepen (2005). Stochastic HPSG parse selection using the Redwoods corpus. Journal of Research on Language and Computation 3(1), 83–105.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Zhang, Y., Oepen, S., Carroll, J. (2010). Efficiency in Unification-Based N-Best Parsing. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_13
Download citation
DOI: https://doi.org/10.1007/978-90-481-9352-3_13
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)