Skip to main content
Log in

Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We have constructed a large scale and detailed database of lexical types in Japanese from a treebank that includes detailed linguistic information. The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development. In addition, it clarifies what lexical types are needed for precise Japanese NLP on the basis of the treebank. In this paper, we report on the motivation and methodology of the database construction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Currently, the Hinoki treebank contains about 121,000 sentences (about 10 words per sentence).

  2. http://wiki.delph-in.net/moin/JacyLexTypes

  3. We think we also need another snapshot, that of the grammar rules and principles being used. In this paper, however, we do not deal with it.

  4. These are actual names of the lexical types implemented in our grammar and might not be understandable to people in general.

  5. The object, a conclusion, is expressed by a phonologically null pronoun.

  6. Note that this information is not explicitly stored in the database. Rather, it is dynamically compiled from the database together with a lexicon database, when triggered by a user query. User queries are words like ni.

  7. http://www.linguistics-ontology.org/gold.html

References

  • Bond, F., Fujita, S., Hashimoto, C., Nariyama, S., Nichols, E., Ohtani, A., Tanaka, T., & Amano, S. (2004a). The Hinoki Treebank—toward text understanding. In Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora (LINC-04), Geneva, pp. 7–10.

  • Bond, F., Fujita, S., & Tanaka, T. (2006). The Hinoki syntactic and semantic treebank of Japanese. Language Resources and Evaluation , 40(3–4), 253–261.

    Google Scholar 

  • Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004b). Acquiring an Ontology for a Fundamental Vocabulary. In 20th International Conference on Computational Linguistics (COLING-2004), Geneva, pp. 1319–1325.

  • Breen, J. W. (2004). JMDict: A Japanese-multilingual dictionary. In Coling 2004 Workshop on Multilingual Linguistic Resources, Geneva, pp. 71–78.

  • Dini, L., & Mazzini, G. (1997). Hypertextual grammar development. In Computational Environments for Grammar Development and Linguistic Engineering, Madrid, pp. 24–29.

  • Ikehara, S., Shirai, S., Yokoo, A., & Nakaiwa, H. (1991). Toward an MT system without pre-editing—Effects of new methods in ALT-J/E–. In Third Machine Translation Summit: MT Summit III. Washington, DC, pp. 101–106. (http://xxx.lanl.gov/abs/cmp-lg/9510008).

  • Kurohashi, S., & Nagao, M. (2003). Building a Japanese parsed corpus. In A. Abeille (Ed.), Treebanks: Building and using parsed corpora (Chap. 14, pp. 249–260). Kluwer Academic Publishers.

  • Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., & Asahara, M. (2000). Morphological analysis system ChaSen version 2.2.1 manual. Nara Institute of Science and Technology.

  • Miyazaki, M., Shirai, S., & Ikehara, S. (1995). Gengo katēsetsu-ni motozuku nihongo hinshi-no taikēka-to sono kōyō [A Japanese syntactic category system based on the constructive process theory and its use]. Journal of Natural Language Processing, 2(3), 3–25 (in Japanese).

    Google Scholar 

  • Oepen, S., Flickinger, D., Toutanova, K., & Manning, C. D. (2002). LinGO Redwoods: A rich and dynamic treebank for HPSG. In Proceedings of The First Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria, pp. 139–149.

  • Ohara, K. H., Fujii, S., Ohori, T., Suzuki, R., Saito, H., & Ishizaki, S. (2004). The Japanese FrameNet Project: An introduction. In Proceedings of the LREC-2004 Satellite Workshop Building Lexical Resources from Semantically Annotated Corpora, pp. 9–11.

  • Siegel, M. (2006). JACY, A grammar for annotating syntax, semantics and pragmatics of written and spoken Japanese for NLP application purposes, Habilitation thesis.

  • Siegel, M., & Bender, E. M. (2002). Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Taipei, Taiwan.

  • Takeuchi, K., Inui, K., & Fujita, A. (2006). Description of syntactic and semantic characteristics of Japanese verbs based on lexical conceptual structure. In Lexicon Forum, Vol. 2, Hituzi Syobou, pp. 85–120 (in Japanese).

  • Toutanova, K., Manning, C. D., Flickinger, D., & Oepen, S. (2005). Stochastic HPSG Parse disambiguation using the Redwoods corpus. Research on Language and Computation, 3(1), 83–105.

    Article  Google Scholar 

  • Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., & Nakagawa, S. (2005). A corpus for classifying usages of Japanese compound functional expressions. In Proceedings of Pacific Association for Computational Linguistics 2005. Tokyo, Japan.

Download references

Acknowledgements

We would like to thank the other members of NTT Natural Language Group, Dan Flickinger, Stephen Oepen, and Jason Katz-Brown for their stimulating discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chikara Hashimoto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hashimoto, C., Bond, F., Tanaka, T. et al. Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank. Lang Resources & Evaluation 42, 117–126 (2008). https://doi.org/10.1007/s10579-008-9065-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-008-9065-9

Keywords

Navigation