Dependency parsing of Japanese monologue using clause boundaries

Ohno, Tomohiro; Matsubara, Shigeki; Kashioka, Hideki; Maruyama, Takehiko; Tanaka, Hideki; Inagaki, Yasuyoshi

doi:10.1007/s10579-007-9023-y

Dependency parsing of Japanese monologue using clause boundaries

Published: 12 July 2007

Volume 40, pages 263–279, (2006)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Tomohiro Ohno¹,
Shigeki Matsubara²,
Hideki Kashioka³,
Takehiko Maruyama⁴,
Hideki Tanaka⁵ &
…
Yasuyoshi Inagaki⁶

170 Accesses
Explore all metrics

Abstract

Spoken monologues feature greater sentence length and structural complexity than spoken dialogues. To achieve high-parsing performance for spoken monologues, simplifying the structure by dividing a sentence into suitable language units could prove effective. This paper proposes a method for dependency parsing of Japanese spoken monologues based on sentence segmentation. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. First, dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, dependencies across clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows the effectiveness of this method for efficient dependency parsing of Japanese monologue sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Annotated Clause Boundaries’ Influence on Parsing Results

Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches

Article Open access 06 July 2024

Dependency Parsing of Turkish

Notes

The average sentence length in the spoken monologue corpus ‘Asu-Wo-Yomu’ that our research used was 29.1 (morphemes/sentences); in the two dialogue corpora, whose names are respectively ‘SLDB’ (Morimoto et al. 1994) and ‘BTEC’ (Takezawa et al. 2002), it was respectively 11.7 and 7.9 (morphemes/sentences). In addition, it has also been reported that in the above monologue corpus extremely long sentences existed that contained more than 100 morphemes (Kashioka and Maruyama 2004).
Bunsetsu is a linguistic unit in Japanese that roughly corresponds to a basic phrase in English. A bunsetsu consists of one independent word and zero or more ancillary words. A dependency is a modification relation in which a modifier bunsetsu depends on a modified bunsetsu. That is, the modifier bunsetsu and the modified bunsetsu work as modifier and modifyee, respectively.
The labels include a few other constituents that do not strictly represent clause boundaries but can be regarded as syntactically independent elements, such as ‘topicalized element wa,’ ‘conjunctives,’ and so on. The following example is a sentence that has these clause boundaries: Soshite mittu-me-wa shohi-sha-kyoiku-desu (And the third is consumer education).Soshite (and) /Conjunctives/ mittu-me-wa (the third) /Topicalized element wa/ shohi-sha-kyoiku-desu (is consumer education) /Sentence end/
Asu-Wo-Yomu is a collection of transcriptions of a TV commentary program of the Japan Broadcasting Corporation (NHK). The commentator speaks on current social issues for 10 min.
It is difficult to preliminarily divide a monologue into sentences because there are no clear sentence breaks in the monologues. However, since methods for detecting sentence boundaries have already been proposed (Shriberg et al. 2000; Kim and Woodland 2001; Huang and Zweig 2002; Shitaoka et al. 2004), we assume that they can be detected automatically before dependency parsing.
We analyzed the 200 sentences described in Sect. 2.3 and confirmed that 70.6% (522/751) of the final bunsetsus of the clause boundary units depended on the final bunsetsus of other clause boundary units.
The specifications of these annotations reflect those described in Sect. 2.3.
Our method treated grammatically ill-formed linguistic phenomena except fillers the same as normal bunsetsus. However, in our experiment, this treatment has almost no influence on parsing accuracy because such phenomena, except fillers, are rarely found in the monologue corpus ‘Asu-Wo-Yomu.’
We regard the label name provided for the end boundary of a clause boundary unit as that unit’s type.

References

Agarwal, R., & Boggess, L. (1992). A simple but useful approach to conjunct indentification. In Proceedings of 30th ACL. (pp. 15–21).
Asahara, M., & Matsumoto, Y. (2003). Filler and disfluency identification based on morphological analysis and chunking. In Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (pp. 163–166).
Bear, J., & Price, P. (1990). Prosody, syntax, and parsing. In Proceedings of 28th ACL (pp. 17–22).
Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of 1st NAACL (pp. 132–139).
Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of 34th ACL (pp. 184–191).
Core, M. G., & Schubert, L. K. (1999). A syntactic framework for speech repairs and other disruptions. In Proceedings of 37th ACL (pp. 413–420).
Delmonte, R. (2003). Parsing spontaneous speech. In Proceedings of 8th EUROSPEECH (pp. 1999–2004).
Fujio, M., & Matsumoto, Y. (1998). Japanese dependency structure analysis based on lexicalized statistics. In Proceedings of 3rd EMNLP (pp. 87–96).
Hindle, D. (1983). Deterministic parsing of syntactic nonfluencies. In Proceedings of 21th ACL (pp. 123–128).
Huang, J., & Zweig, G. (2002). Maximum entropy model for punctuation annotation from speech. In Proceedings of 7th ICSLP (pp. 917–920).
Kashioka, H., & Maruyama, T. (2004). Segmentation of semantic units in Japanese monologues. In Proceedings of ICSLT-O-COCOSDA 2004 (pp. 87–92).
Kim, J., & Woodland, P. C. (2001). The use of prosody in a combined system for punctuation generation and speech recognition. In Proceedings of 7th EUROSPEECH (pp. 2757–2760).
Kim, M., & Lee, J. (2004). Syntactic analysis of long sentences based on s-clauses. In Proceedings of 1st IJCNLP (pp. 518–526).
Kudo, T., & Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of 6th CoNLL (pp. 63–69).
Kurohashi, S., & Nagao, M. (1994). A syntactic analysis method of long Japanese sentences based on the detection of conjunctive structures. Computational Linguistics, 20(4), 507–534.
Google Scholar
Kurohashi, S., & Nagao, M. (1998). Building a Japanese parsed corpus while improving the parsing system. In Proceedings of 1st LREC (pp. 719–724).
Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous speech corpus of Japanese. In Proceedings of 2nd LREC (pp. 947–952).
Maruyama, T., Kashioka, H., Kumano, T., & Tanaka H. (2004). Development and evaluation of Japanese clause boundaries annotation program. Journal of Natural Language Processing, 11(3), 39–68. (In Japanese)
Google Scholar
Matsumoto, Y., Kitauchi, A., Yamashita, T., & Hirano, Y. (1999). Japanese morphological analysis system ChaSen version 2.0 manual. NAIST Technical Report, NAIST-IS-TR99009.
Morimoto, T., Uratani, N., Takezawa, T., Furuse, O., Sobashima, Y., Iida, H., Nakamura, A., Sagisaka, Y., Higuchi, N., & Yamazaki, Y. (1994). A speech and language database for speech translation research. In Proceedings of 3rd ICSLP (pp. 1791–1794).
Ohno, T., Matsubara, S., Kashioka, H., Kato, N., & Inagaki, Y. (2005a). Incremental dependency parsing of Japanese spoken monologue based on clause boundaries. In Proceedings of 9th EUROSPEECH (pp. 3449–3452).
Ohno, T., Matsubara, S., Kawaguchi, N., & Inagaki, Y. (2005b). Robust dependency parsing of spontaneous Japanese spoken language. IEICE Transactions on Information and Systems, E88-D(3), 545–552.
Article Google Scholar
Ratnaparkhi, A. (1997). A liner observed time statistical parser based on maximum entropy models. In Proceedings of 2nd EMNLP(pp. 1–10).
Shirai, S., Ikehara, S., Yokoo, A., & Kimura, J. (1995). A new dependency analysis method based on semantically embedded sentence structures and its performance on Japanese subordinate clause. Journal of Information Processing Society of Japan, 36(10), 2353–2361. (In Japanese).
Google Scholar
Shitaoka, K., Uchimoto, K., Kawahara, T., & Isahara, H. (2004). Dependency structure analysis and sentence boundary detection in spontaneous Japanese. In Proceedings of 20th COLING (pp. 1107–1113).
Shriberg, E., Stolcke, A., Hakkani-Tur, D., & Tur, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1–2), 127–154.
Article Google Scholar
Stolcke, A., & Shriberg, E. (1996). Statistical language modeling for speech disfluencies. In Proceedings of ICASSP-96 (pp. 405–408).
Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., & Yamamoto, S. (2002). Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In Proceedings of 3rd LREC (pp. 147–152).
Uchimoto, K., Sekine, S., & Isahara, K. (1999). Japanese dependency structure analysis based on maximum entropy models. In Proceedings of 9th EACL (pp. 196–203).
Utsuro, T., Nishiokayama, S., Fujio, M., & Matsumoto, Y. (2000). Analyzing dependencies of Japanese subordinate clauses based on statistics of scope embedding preference. In Proceedings of 1st NAACL (pp. 110–117).

Download references

Acknowledgements

The authors would like to thank Prof. Toshiki Sakabe of Graduate School of Information Science, Nagoya University for his valuable advice. This research was supported in part by a contract with the Strategic Information and Communications R & D Promotion Programme, Ministry of Internal Affairs and Communications and a Grant-in-Aid for Young Scientists of JSPS. The first author was partially supported by JSPS Research Fellowships for Young Scientists.

Author information

Authors and Affiliations

Department of Information Engineering, Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
Tomohiro Ohno
Information Technology Center, Nagoya University, Nagoya, Japan
Shigeki Matsubara
ATR Spoken Language Communication Research Laboratories, Kyoto, Japan
Hideki Kashioka
The National Institute for Japanese Language, Tokyo, Japan
Takehiko Maruyama
NHK Science & Technical Research Laboratories, Tokyo, Japan
Hideki Tanaka
Faculty of Information Science and Technology, Aichi Prefectural University, Aichi, Japan
Yasuyoshi Inagaki

Authors

Tomohiro Ohno
View author publications
You can also search for this author in PubMed Google Scholar
Shigeki Matsubara
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kashioka
View author publications
You can also search for this author in PubMed Google Scholar
Takehiko Maruyama
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyoshi Inagaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomohiro Ohno.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohno, T., Matsubara, S., Kashioka, H. et al. Dependency parsing of Japanese monologue using clause boundaries. Lang Resources & Evaluation 40, 263–279 (2006). https://doi.org/10.1007/s10579-007-9023-y

Download citation

Received: 25 August 2006
Accepted: 14 May 2007
Published: 12 July 2007
Issue Date: December 2006
DOI: https://doi.org/10.1007/s10579-007-9023-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dependency parsing of Japanese monologue using clause boundaries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Annotated Clause Boundaries’ Influence on Parsing Results

Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches

Dependency Parsing of Turkish

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dependency parsing of Japanese monologue using clause boundaries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Annotated Clause Boundaries’ Influence on Parsing Results

Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches

Dependency Parsing of Turkish

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation