Skip to main content

Using a Partially Annotated Corpus to Build a Dependency Parser for Japanese

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

We explore the use of a partially annotated corpus to build a dependency parser for Japanese. We examine two types of partially annotated corpora. It is found that a parser trained with a corpus that does not have any grammatical tags for words can demonstrate an accuracy of 87.38%, which is comparable to the current state-of-the-art accuracy on the Kyoto University Corpus. In contrast, a parser trained with a corpus that has only dependency annotations for each two adjacent bunsetsus (chunks) shows moderate performance. Nonetheless, it is notable that features based on character n-grams are found very useful for a dependency parser for Japanese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of ACL-1995, pp. 189–196 (1995)

    Google Scholar 

  2. Thompson, C.A., Califf, M.L., Mooney, R.J.: Active learning for natural language parsing and information extraction. In: Proc. of the Sixteenth International Conference on Machine Learning, pp. 406–414 (1999)

    Google Scholar 

  3. Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: Proc. of ACL-2002, pp. 120–127 (2002)

    Google Scholar 

  4. Uchimoto, K., Sekine, S., Isahara, H.: Japanese dependency structure analysis based on maximum entropy models. In: Proc. of EACL-1999, pp. 196–203 (1999)

    Google Scholar 

  5. Yoon, J., Choi, K., Song, M.: Three types of chunking in Korean and dependency analysis based on lexical association. In: Proc. of the 18th Int. Conf. on Computer Processing of Oriental Languages, pp. 59–65 (1999)

    Google Scholar 

  6. Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)

    Google Scholar 

  7. Sassano, M.: Linear-time dependency analysis for Japanese. In: Proc. of COLING 2004, pp. 8–14 (2004)

    Google Scholar 

  8. Kurohashi, S., Nagao, M.: Building a Japanese parsed corpus while improving the parsing system. In: Proc. of the 1st LREC, pp. 719–724 (1998)

    Google Scholar 

  9. Kudo, T., Matsumoto, Y.: Japanese dependency analysis using cascaded chunking. In: Proc. of CoNLL-2002, pp. 63–69 (2002)

    Google Scholar 

  10. Sekine, S., Uchimoto, K., Isahara, H.: Backward beam search algorithm for dependency analysis of Japanese. In: Proc. of COLING-2000, pp. 754–760 (2000)

    Google Scholar 

  11. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  12. Kudo, T., Matsumoto, Y.: Japanese dependency structure analysis based on support vector machines. In: Proc. of EMNLP/VLC 2000, pp. 18–25 (2000)

    Google Scholar 

  13. Sato, S.: CTM: An example-based translation aid system. In: Proc. of COLING-1992, pp. 1259–1263 (1992)

    Google Scholar 

  14. Sato, S., Kawase, T.: A high-speed best match retrieval method for Japanese text. Technical Report IS-RR-94-9I, Japan Advanced Institute of Science and Technology, Hokuriku (1994)

    Google Scholar 

  15. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proc. of VLC 1995, pp. 82–94 (1995)

    Google Scholar 

  16. Murata, M., Uchimoto, K., Ma, Q., Isahara, H.: Bunsetsu identification using category-exclusive rules. In: Proc. of COLING-2000, pp. 565–571 (2000)

    Google Scholar 

  17. Maruyama, H., Ogino, S.: A statistical property of Japanese phrase-to-phrase modifications. Mathematical Linguistics 18, 348–352 (1992)

    Google Scholar 

  18. Sekine, S.: Japanese dependency analysis using a deterministic finite state transducer. In: Proc. of COLING-2000, pp. 761–767 (2000)

    Google Scholar 

  19. Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: Proc. of ACL-1992, pp. 128–135 (1992)

    Google Scholar 

  20. Riezler, S., King, T.H., Kaplan, R.M., Richard, C., Maxwell, J.T., Johnson, l.M.: Parsing the Wall Street Journal using a lexical-functional grammar and discriminative estimation techniques. In: Proc. of ACL-2002, pp. 271–278 (2002)

    Google Scholar 

  21. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. of the 7th Pacific Symposium on Biocomputing, pp. 564–575 (2002)

    Google Scholar 

  22. Lodhi, H., Saunders, C., Shawe-Tayor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sassano, M. (2005). Using a Partially Annotated Corpus to Build a Dependency Parser for Japanese. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_8

Download citation

  • DOI: https://doi.org/10.1007/11562214_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics