Abstract
We explore the use of a partially annotated corpus to build a dependency parser for Japanese. We examine two types of partially annotated corpora. It is found that a parser trained with a corpus that does not have any grammatical tags for words can demonstrate an accuracy of 87.38%, which is comparable to the current state-of-the-art accuracy on the Kyoto University Corpus. In contrast, a parser trained with a corpus that has only dependency annotations for each two adjacent bunsetsus (chunks) shows moderate performance. Nonetheless, it is notable that features based on character n-grams are found very useful for a dependency parser for Japanese.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of ACL-1995, pp. 189–196 (1995)
Thompson, C.A., Califf, M.L., Mooney, R.J.: Active learning for natural language parsing and information extraction. In: Proc. of the Sixteenth International Conference on Machine Learning, pp. 406–414 (1999)
Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: Proc. of ACL-2002, pp. 120–127 (2002)
Uchimoto, K., Sekine, S., Isahara, H.: Japanese dependency structure analysis based on maximum entropy models. In: Proc. of EACL-1999, pp. 196–203 (1999)
Yoon, J., Choi, K., Song, M.: Three types of chunking in Korean and dependency analysis based on lexical association. In: Proc. of the 18th Int. Conf. on Computer Processing of Oriental Languages, pp. 59–65 (1999)
Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)
Sassano, M.: Linear-time dependency analysis for Japanese. In: Proc. of COLING 2004, pp. 8–14 (2004)
Kurohashi, S., Nagao, M.: Building a Japanese parsed corpus while improving the parsing system. In: Proc. of the 1st LREC, pp. 719–724 (1998)
Kudo, T., Matsumoto, Y.: Japanese dependency analysis using cascaded chunking. In: Proc. of CoNLL-2002, pp. 63–69 (2002)
Sekine, S., Uchimoto, K., Isahara, H.: Backward beam search algorithm for dependency analysis of Japanese. In: Proc. of COLING-2000, pp. 754–760 (2000)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Kudo, T., Matsumoto, Y.: Japanese dependency structure analysis based on support vector machines. In: Proc. of EMNLP/VLC 2000, pp. 18–25 (2000)
Sato, S.: CTM: An example-based translation aid system. In: Proc. of COLING-1992, pp. 1259–1263 (1992)
Sato, S., Kawase, T.: A high-speed best match retrieval method for Japanese text. Technical Report IS-RR-94-9I, Japan Advanced Institute of Science and Technology, Hokuriku (1994)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proc. of VLC 1995, pp. 82–94 (1995)
Murata, M., Uchimoto, K., Ma, Q., Isahara, H.: Bunsetsu identification using category-exclusive rules. In: Proc. of COLING-2000, pp. 565–571 (2000)
Maruyama, H., Ogino, S.: A statistical property of Japanese phrase-to-phrase modifications. Mathematical Linguistics 18, 348–352 (1992)
Sekine, S.: Japanese dependency analysis using a deterministic finite state transducer. In: Proc. of COLING-2000, pp. 761–767 (2000)
Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: Proc. of ACL-1992, pp. 128–135 (1992)
Riezler, S., King, T.H., Kaplan, R.M., Richard, C., Maxwell, J.T., Johnson, l.M.: Parsing the Wall Street Journal using a lexical-functional grammar and discriminative estimation techniques. In: Proc. of ACL-2002, pp. 271–278 (2002)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. of the 7th Pacific Symposium on Biocomputing, pp. 564–575 (2002)
Lodhi, H., Saunders, C., Shawe-Tayor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sassano, M. (2005). Using a Partially Annotated Corpus to Build a Dependency Parser for Japanese. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_8
Download citation
DOI: https://doi.org/10.1007/11562214_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)