Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

Hashimoto, Chikara; Bond, Francis; Tanaka, Takaaki; Siegel, Melanie

doi:10.1007/s10579-008-9065-9

Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

Published: 27 February 2008

Volume 42, pages 117–126, (2008)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Chikara Hashimoto¹,
Francis Bond²,
Takaaki Tanaka³ &
…
Melanie Siegel⁴

110 Accesses
2 Citations
Explore all metrics

Abstract

We have constructed a large scale and detailed database of lexical types in Japanese from a treebank that includes detailed linguistic information. The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development. In addition, it clarifies what lexical types are needed for precise Japanese NLP on the basis of the treebank. In this paper, we report on the motivation and methodology of the database construction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A System for Archivable Grammar Documentation

Towards a Universal Grammar for Natural Language Processing

Deep Parsing of Turkish with Lexical-Functional Grammar

Notes

Currently, the Hinoki treebank contains about 121,000 sentences (about 10 words per sentence).
http://wiki.delph-in.net/moin/JacyLexTypes
We think we also need another snapshot, that of the grammar rules and principles being used. In this paper, however, we do not deal with it.
These are actual names of the lexical types implemented in our grammar and might not be understandable to people in general.
The object, a conclusion, is expressed by a phonologically null pronoun.
Note that this information is not explicitly stored in the database. Rather, it is dynamically compiled from the database together with a lexicon database, when triggered by a user query. User queries are words like ni.
http://www.linguistics-ontology.org/gold.html

References

Bond, F., Fujita, S., Hashimoto, C., Nariyama, S., Nichols, E., Ohtani, A., Tanaka, T., & Amano, S. (2004a). The Hinoki Treebank—toward text understanding. In Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora (LINC-04), Geneva, pp. 7–10.
Bond, F., Fujita, S., & Tanaka, T. (2006). The Hinoki syntactic and semantic treebank of Japanese. Language Resources and Evaluation , 40(3–4), 253–261.
Google Scholar
Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004b). Acquiring an Ontology for a Fundamental Vocabulary. In 20th International Conference on Computational Linguistics (COLING-2004), Geneva, pp. 1319–1325.
Breen, J. W. (2004). JMDict: A Japanese-multilingual dictionary. In Coling 2004 Workshop on Multilingual Linguistic Resources, Geneva, pp. 71–78.
Dini, L., & Mazzini, G. (1997). Hypertextual grammar development. In Computational Environments for Grammar Development and Linguistic Engineering, Madrid, pp. 24–29.
Ikehara, S., Shirai, S., Yokoo, A., & Nakaiwa, H. (1991). Toward an MT system without pre-editing—Effects of new methods in ALT-J/E–. In Third Machine Translation Summit: MT Summit III. Washington, DC, pp. 101–106. (http://xxx.lanl.gov/abs/cmp-lg/9510008).
Kurohashi, S., & Nagao, M. (2003). Building a Japanese parsed corpus. In A. Abeille (Ed.), Treebanks: Building and using parsed corpora (Chap. 14, pp. 249–260). Kluwer Academic Publishers.
Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., & Asahara, M. (2000). Morphological analysis system ChaSen version 2.2.1 manual. Nara Institute of Science and Technology.
Miyazaki, M., Shirai, S., & Ikehara, S. (1995). Gengo katēsetsu-ni motozuku nihongo hinshi-no taikēka-to sono kōyō [A Japanese syntactic category system based on the constructive process theory and its use]. Journal of Natural Language Processing, 2(3), 3–25 (in Japanese).
Google Scholar
Oepen, S., Flickinger, D., Toutanova, K., & Manning, C. D. (2002). LinGO Redwoods: A rich and dynamic treebank for HPSG. In Proceedings of The First Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria, pp. 139–149.
Ohara, K. H., Fujii, S., Ohori, T., Suzuki, R., Saito, H., & Ishizaki, S. (2004). The Japanese FrameNet Project: An introduction. In Proceedings of the LREC-2004 Satellite Workshop Building Lexical Resources from Semantically Annotated Corpora, pp. 9–11.
Siegel, M. (2006). JACY, A grammar for annotating syntax, semantics and pragmatics of written and spoken Japanese for NLP application purposes, Habilitation thesis.
Siegel, M., & Bender, E. M. (2002). Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Taipei, Taiwan.
Takeuchi, K., Inui, K., & Fujita, A. (2006). Description of syntactic and semantic characteristics of Japanese verbs based on lexical conceptual structure. In Lexicon Forum, Vol. 2, Hituzi Syobou, pp. 85–120 (in Japanese).
Toutanova, K., Manning, C. D., Flickinger, D., & Oepen, S. (2005). Stochastic HPSG Parse disambiguation using the Redwoods corpus. Research on Language and Computation, 3(1), 83–105.
Article Google Scholar
Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., & Nakagawa, S. (2005). A corpus for classifying usages of Japanese compound functional expressions. In Proceedings of Pacific Association for Computational Linguistics 2005. Tokyo, Japan.

Download references

Acknowledgements

We would like to thank the other members of NTT Natural Language Group, Dan Flickinger, Stephen Oepen, and Jason Katz-Brown for their stimulating discussion.

Author information

Authors and Affiliations

Graduate School of Science and Engineering, Yamagata University, Yamagata, Japan
Chikara Hashimoto
Computational Linguistics Group, National Institute of Information and Communications Technology, Kyoto, Japan
Francis Bond
Machine Translation Research Group, NTT Communication Science Laboratories, Soraku-gun, Japan
Takaaki Tanaka
Acrolinx GmbH, Rosenstr.2, 10178, Berlin, Germany
Melanie Siegel

Authors

Chikara Hashimoto
View author publications
You can also search for this author in PubMed Google Scholar
Francis Bond
View author publications
You can also search for this author in PubMed Google Scholar
Takaaki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Siegel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chikara Hashimoto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hashimoto, C., Bond, F., Tanaka, T. et al. Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank. Lang Resources & Evaluation 42, 117–126 (2008). https://doi.org/10.1007/s10579-008-9065-9

Download citation

Received: 25 August 2006
Accepted: 30 January 2008
Published: 27 February 2008
Issue Date: May 2008
DOI: https://doi.org/10.1007/s10579-008-9065-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

Abstract

Access this article

Similar content being viewed by others

A System for Archivable Grammar Documentation

Towards a Universal Grammar for Natural Language Processing

Deep Parsing of Turkish with Lexical-Functional Grammar

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

Abstract

Access this article

Similar content being viewed by others

A System for Archivable Grammar Documentation

Towards a Universal Grammar for Natural Language Processing

Deep Parsing of Turkish with Lexical-Functional Grammar

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation