Skip to main content
Log in

Dependency parsing of Japanese monologue using clause boundaries

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Spoken monologues feature greater sentence length and structural complexity than spoken dialogues. To achieve high-parsing performance for spoken monologues, simplifying the structure by dividing a sentence into suitable language units could prove effective. This paper proposes a method for dependency parsing of Japanese spoken monologues based on sentence segmentation. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. First, dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, dependencies across clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows the effectiveness of this method for efficient dependency parsing of Japanese monologue sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The average sentence length in the spoken monologue corpus ‘Asu-Wo-Yomu’ that our research used was 29.1 (morphemes/sentences); in the two dialogue corpora, whose names are respectively ‘SLDB’ (Morimoto et al. 1994) and ‘BTEC’ (Takezawa et al. 2002), it was respectively 11.7 and 7.9 (morphemes/sentences). In addition, it has also been reported that in the above monologue corpus extremely long sentences existed that contained more than 100 morphemes (Kashioka and Maruyama 2004).

  2. Bunsetsu is a linguistic unit in Japanese that roughly corresponds to a basic phrase in English. A bunsetsu consists of one independent word and zero or more ancillary words. A dependency is a modification relation in which a modifier bunsetsu depends on a modified bunsetsu. That is, the modifier bunsetsu and the modified bunsetsu work as modifier and modifyee, respectively.

  3. The labels include a few other constituents that do not strictly represent clause boundaries but can be regarded as syntactically independent elements, such as ‘topicalized element wa,’ ‘conjunctives,’ and so on. The following example is a sentence that has these clause boundaries: Soshite mittu-me-wa shohi-sha-kyoiku-desu (And the third is consumer education).Soshite (and) /Conjunctives/ mittu-me-wa (the third) /Topicalized element wa/ shohi-sha-kyoiku-desu (is consumer education) /Sentence end/

  4. Asu-Wo-Yomu is a collection of transcriptions of a TV commentary program of the Japan Broadcasting Corporation (NHK). The commentator speaks on current social issues for 10 min.

  5. It is difficult to preliminarily divide a monologue into sentences because there are no clear sentence breaks in the monologues. However, since methods for detecting sentence boundaries have already been proposed (Shriberg et al. 2000; Kim and Woodland 2001; Huang and Zweig 2002; Shitaoka et al. 2004), we assume that they can be detected automatically before dependency parsing.

  6. We analyzed the 200 sentences described in Sect. 2.3 and confirmed that 70.6% (522/751) of the final bunsetsus of the clause boundary units depended on the final bunsetsus of other clause boundary units.

  7. The specifications of these annotations reflect those described in Sect. 2.3.

  8. Our method treated grammatically ill-formed linguistic phenomena except fillers the same as normal bunsetsus. However, in our experiment, this treatment has almost no influence on parsing accuracy because such phenomena, except fillers, are rarely found in the monologue corpus ‘Asu-Wo-Yomu.’

  9. We regard the label name provided for the end boundary of a clause boundary unit as that unit’s type.

References

  • Agarwal, R., & Boggess, L. (1992). A simple but useful approach to conjunct indentification. In Proceedings of 30th ACL. (pp. 15–21).

  • Asahara, M., & Matsumoto, Y. (2003). Filler and disfluency identification based on morphological analysis and chunking. In Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (pp. 163–166).

  • Bear, J., & Price, P. (1990). Prosody, syntax, and parsing. In Proceedings of 28th ACL (pp. 17–22).

  • Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of 1st NAACL (pp. 132–139).

  • Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of 34th ACL (pp. 184–191).

  • Core, M. G., & Schubert, L. K. (1999). A syntactic framework for speech repairs and other disruptions. In Proceedings of 37th ACL (pp. 413–420).

  • Delmonte, R. (2003). Parsing spontaneous speech. In Proceedings of 8th EUROSPEECH (pp. 1999–2004).

  • Fujio, M., & Matsumoto, Y. (1998). Japanese dependency structure analysis based on lexicalized statistics. In Proceedings of 3rd EMNLP (pp. 87–96).

  • Hindle, D. (1983). Deterministic parsing of syntactic nonfluencies. In Proceedings of 21th ACL (pp. 123–128).

  • Huang, J., & Zweig, G. (2002). Maximum entropy model for punctuation annotation from speech. In Proceedings of 7th ICSLP (pp. 917–920).

  • Kashioka, H., & Maruyama, T. (2004). Segmentation of semantic units in Japanese monologues. In Proceedings of ICSLT-O-COCOSDA 2004 (pp. 87–92).

  • Kim, J., & Woodland, P. C. (2001). The use of prosody in a combined system for punctuation generation and speech recognition. In Proceedings of 7th EUROSPEECH (pp. 2757–2760).

  • Kim, M., & Lee, J. (2004). Syntactic analysis of long sentences based on s-clauses. In Proceedings of 1st IJCNLP (pp. 518–526).

  • Kudo, T., & Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of 6th CoNLL (pp. 63–69).

  • Kurohashi, S., & Nagao, M. (1994). A syntactic analysis method of long Japanese sentences based on the detection of conjunctive structures. Computational Linguistics, 20(4), 507–534.

    Google Scholar 

  • Kurohashi, S., & Nagao, M. (1998). Building a Japanese parsed corpus while improving the parsing system. In Proceedings of 1st LREC (pp. 719–724).

  • Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous speech corpus of Japanese. In Proceedings of 2nd LREC (pp. 947–952).

  • Maruyama, T., Kashioka, H., Kumano, T., & Tanaka H. (2004). Development and evaluation of Japanese clause boundaries annotation program. Journal of Natural Language Processing, 11(3), 39–68. (In Japanese)

    Google Scholar 

  • Matsumoto, Y., Kitauchi, A., Yamashita, T., & Hirano, Y. (1999). Japanese morphological analysis system ChaSen version 2.0 manual. NAIST Technical Report, NAIST-IS-TR99009.

  • Morimoto, T., Uratani, N., Takezawa, T., Furuse, O., Sobashima, Y., Iida, H., Nakamura, A., Sagisaka, Y., Higuchi, N., & Yamazaki, Y. (1994). A speech and language database for speech translation research. In Proceedings of 3rd ICSLP (pp. 1791–1794).

  • Ohno, T., Matsubara, S., Kashioka, H., Kato, N., & Inagaki, Y. (2005a). Incremental dependency parsing of Japanese spoken monologue based on clause boundaries. In Proceedings of 9th EUROSPEECH (pp. 3449–3452).

  • Ohno, T., Matsubara, S., Kawaguchi, N., & Inagaki, Y. (2005b). Robust dependency parsing of spontaneous Japanese spoken language. IEICE Transactions on Information and Systems, E88-D(3), 545–552.

    Article  Google Scholar 

  • Ratnaparkhi, A. (1997). A liner observed time statistical parser based on maximum entropy models. In Proceedings of 2nd EMNLP(pp. 1–10).

  • Shirai, S., Ikehara, S., Yokoo, A., & Kimura, J. (1995). A new dependency analysis method based on semantically embedded sentence structures and its performance on Japanese subordinate clause. Journal of Information Processing Society of Japan, 36(10), 2353–2361. (In Japanese).

    Google Scholar 

  • Shitaoka, K., Uchimoto, K., Kawahara, T., & Isahara, H. (2004). Dependency structure analysis and sentence boundary detection in spontaneous Japanese. In Proceedings of 20th COLING (pp. 1107–1113).

  • Shriberg, E., Stolcke, A., Hakkani-Tur, D., & Tur, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1–2), 127–154.

    Article  Google Scholar 

  • Stolcke, A., & Shriberg, E. (1996). Statistical language modeling for speech disfluencies. In Proceedings of ICASSP-96 (pp. 405–408).

  • Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., & Yamamoto, S. (2002). Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In Proceedings of 3rd LREC (pp. 147–152).

  • Uchimoto, K., Sekine, S., & Isahara, K. (1999). Japanese dependency structure analysis based on maximum entropy models. In Proceedings of 9th EACL (pp. 196–203).

  • Utsuro, T., Nishiokayama, S., Fujio, M., & Matsumoto, Y. (2000). Analyzing dependencies of Japanese subordinate clauses based on statistics of scope embedding preference. In Proceedings of 1st NAACL (pp. 110–117).

Download references

Acknowledgements

The authors would like to thank Prof. Toshiki Sakabe of Graduate School of Information Science, Nagoya University for his valuable advice. This research was supported in part by a contract with the Strategic Information and Communications R & D Promotion Programme, Ministry of Internal Affairs and Communications and a Grant-in-Aid for Young Scientists of JSPS. The first author was partially supported by JSPS Research Fellowships for Young Scientists.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomohiro Ohno.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohno, T., Matsubara, S., Kashioka, H. et al. Dependency parsing of Japanese monologue using clause boundaries. Lang Resources & Evaluation 40, 263–279 (2006). https://doi.org/10.1007/s10579-007-9023-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9023-y

Keywords

Navigation