Abstract
This paper motivates and describes treebank annotation for Japanese and English following a scheme adapted from the Annotation manual for the Penn Historical Corpora and the PCEEC (Santorini 2010). The purpose of this annotation is to create a syntactic base from which meaning representations can be built automatically on a corpus linguistics scale (thousands of examples). Advantages of the adopted annotation scheme are highlighted. Most notably, marking clause level functional information is essential for deterministically building meaning representations beyond the predicate-argument structure level. Also an internal syntax where phrasal categories are fundamentally similar is of great assistance. Finally, the paper demonstrates how scope information is simple to add when bracketed syntactic structure is inherently flat.
This research has been supported by the JST PRESTO program (Synthesis of Knowledge for Information Oriented Society). We wish to thank attendees of LENLS9 for comments received that prompted improvements of the paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bies, A., Ferguson, M., Katz, K., MacIntyre, R.: Bracketing guidelines for Treebank II style Penn Treebank project. Tech. Rep. MS-CIS-95-06, LINC LAB 281, University of Pennsylvania Computer and Information Science Department (1995)
Bies, A., Maamouri, M.: Penn Arabic Treebank Guidelines. Tech. rep., Linguistic Data Consortium, University of Pennsylvania. DRAFT (2003)
Blackburn, P., Bos, J.: Computational semantics. Theoria 13, 27–45 (2003)
Bos, J., Clark, S., Steedman, M., Curran, J.R., Hockenmaier, J.: Wide-coverage semantic representations from a CCG parser. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland (2004)
Butler, A.: The Semantics of Grammatical Dependencies. Current Research in the Semantics/Pragmatics Interface, vol. 23. Emerald, Bingley (2010)
Butler, A., Yoshimoto, K.: Banking meaning representations from treebanks. Linguistic Issues in Language Technology - LiLT 7(1), 1–22 (2012)
Butler, A., Zhou, Z., Yoshimoto, K.: Problems for successful bunsetsu based parsing and some solutions. In: Proceedings of the Eighteenth Annual Meeting of the Association of Natural Language Processing, pp. 951–954. The Association of Natural Language Processing (2012)
Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Automatic annotation of the Penn Treebank with LFG F-structure information. In: LREC 2002 Workshop on Linguistic Knowledge Acquisition and Representation—Bootstrapping Annotated Language Data, Las Palmas, Spain, pp. 8–15 (2002)
Davidson, D.: The logical form of action sentences. In: Rescher, N. (ed.) The Logic of Decision and Action. University of Pittsburgh Press, Pittsburgh (1967); Reprinted in: Davidson, D.: Essays on Actions and Events, pp. 105–122. Claredon Press, Oxford (1980)
Dekker, P.: Dynamic Semantics. Studies in Linguistics and Philosophy, vol. 91. Springer, Dordrecht (2012)
Han, C.-H., Han, N.-R., Ko, E.-S.: Bracketing guidelines for Penn Korean TreeBank. Tech. Rep. IRCS Report 01-10, Institute for Research in Cognitive Science, University of Pennsylvania (2001)
Hashimoto, S.: Essentials of Japanese Grammar (Kokugoho Yousetsu). Iwanami (1934) (in Japanese)
Kawahara, D., Sasano, R., Kurohashi, S., Hashida, K.: Specification for annotating case, ellipsis and coreference. Kyoto Text Corpus Version 4.0 (2005) (in Japanese)
King, T.H., Crouch, R., Riezler, S., Dalrymple, M., Kaplan, R.M.: The PARC 700 Dependency Bank. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora, held at the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), Budapest (2003)
Kurohashi, S., Nagao, M.: Building a Japanese parsed corpus – while improving the parsing system. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, ch. 14, pp. 249–260. Kluwer Academic Publishers, Dordrecht (2003)
Miyao, Y., Ninomiya, T., Tsujii, J.: Corpus-oriented grammar development for acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 684–693. Springer, Heidelberg (2005)
Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics 31(1), 71–106 (2005)
Santorini, B.: Annotation manual for the Penn Historical Corpora and the PCEEC (Release 2). Tech. rep., Department of Computer and Information Science, University of Pennsylvania, Philadelphia (2010), http://www.ling.upenn.edu/histcorpora/annotation
Vermeulen, C.F.M.: Variables as stacks: A case study in dynamic model theory. Journal of Logic, Language and Information 9, 143–167 (2000)
Xia, F., Palmer, M., Joshi, A.: A uniform method of grammar extraction and its applications. In: Proceedings of the 2000 Conference on Empirical Methods in Natural Language Processing, Hong Kong, pp. 53–62 (2000)
Xue, N., Xia, F.: The bracketing guidelines for the Penn Chinese Treebank (3.0). Tech. Rep. 00-08, Institute for Research in Cognitive Science, University of Pennsylvania (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Butler, A., Otomo, R., Zhou, Z., Yoshimoto, K. (2013). Treebank Annotation for Formal Semantics Research. In: Motomura, Y., Butler, A., Bekki, D. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2012. Lecture Notes in Computer Science(), vol 7856. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39931-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-39931-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39930-5
Online ISBN: 978-3-642-39931-2
eBook Packages: Computer ScienceComputer Science (R0)