Accent Phrase Segmentation by F0 Clustering Using Superpositional Modelling

Nakai, Mitsuru; Singer, Harald; Sagisaka, Yoshinori; Shimodaira, Hiroshi

doi:10.1007/978-1-4612-2258-3_22

Mitsuru Nakai,
Harald Singer,
Yoshinori Sagisaka &
…
Hiroshi Shimodaira

288 Accesses

Abstract

We propose an automatic method for detecting minor phrase boundaries in Japanese continuous speech by using F ₀ information. In the training phase, F ₀ contours of hand labelled minor phrases are parameterized according to a superpositional model proposed by Fujisaki and Hirose, and assigned to some clusters by a clustering method, in which model parameter of reference templates are calculated as an approximation of each cluster’s centroid. In the segmentation phase, automatic N-best extraction of boundaries is performed by one-stage Dynamic Programming (DP) matching between the reference templates and the target F ₀ contour. About 90% of minor phrase boundaries were correctly detected in speaker independent experiments with the ATR Advanced Telecommunications Research Institute International Japanese continuous speech database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Fujisaki and K. Hirose. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust. Soc. Japan (E), 5:233–242, 1984.
Google Scholar
H. Fujisaki, K. Hirose, and H. Lei. Prosody and syntax in spoken sentences of Standard Chinese. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, pp. 433–43692, 1992.
Google Scholar
H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983.
Chapter Google Scholar
N. Higuchi, T. Hirai, and Y. Sagisaka. Effect of speaking style on parameters of voice fundamental frequency generation model. In Proceedings of the Conference IEICE, Vol. SA-5–3, pp. 488–489, 1994.
Google Scholar
T. Hirai, N. Iwahashi, H. Valbert, N. Higuchi, and Y. Sagisaka. Fundamental frequency contour modelling using statistical analysis. In Proceedings of the Acoust. Soc. Jpn. Autumn 93, pp. 225–226, 1993.
Google Scholar
A. Komatsu, E. Oohira, and A. Ichikawa. Conversational speech understanding based on sentence structure inference using prosodics, and word spotting. Trans. IEICE, (D), J71-D:1218–1228, 1988.
Google Scholar
Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Trans. Commun., COM-28:84–95, 1980.
Article Google Scholar
W. A. Lea, M. F. Medress, and T. E. Skinner. A prosodically guided speech understanding strategy. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-23:30–38, 1975.
Google Scholar
H. Ney. The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-32:263–271, 1984.
Article Google Scholar
M. Nakai and H. Shimodaira. Accent phrase segmentation by finding N-best sequences of pitch pattern templates. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 347–350, 1994.
Google Scholar
R. Schwartz and Y. L. Chow. The N-best algorithm: an efficient and extract procedure for finding the N most likely sentence hypotheses. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, Vol. S2. 12, pp. 81–84, 1990.
Article Google Scholar
S. Sagayama and S. Furui. A technique for pitch extraction by lag-window method. In Proceedings of the Conference IEICE, 1235, 1978.
Google Scholar
H. Shimodaira, M. Kimura, and S. Sagayama. Phrase segmentation of continuous speech by pitch contour DP matching. In Papers of Technical Group on Speech, Vol. SP90-72. IEICE, 1990.
Google Scholar
Y. Suzuki, Y. Sekiguchi, and M. Shigenaga. Detection of phrase boundaries using prosodics for continuous speech recognition. Trans. IEICE, (D-II), J72-D-II: 1606–1617, 1989.
Google Scholar
Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuwabara. A large-scale Japanese speech database. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, pp. 1089–1092, 1990.
Google Scholar
T. Ukita, S. Nakagawa, and T. Sakai. A use of pitch contour in recognizing spoken Japanese arithmetic expressions. Trans. IEICE, (D), J63-D:954–961, 1980.
Google Scholar
C. W. Wightman and M. Ostendorf. Automatic recognition of prosodic phrases. In Proceedings of the International Conference on Acoust., Speech, and Signal Processes, pp. 321–324, 1991.
Google Scholar

Download references

Authors

Mitsuru Nakai
View author publications
You can also search for this author in PubMed Google Scholar
Harald Singer
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinori Sagisaka
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Shimodaira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ATR Interpreting Telecommunications Research Labs, 2-2, Hikaridai, Seika-cho, Soraku-gun, 619-02, Kyoto, Japan
Yoshinori Sagisaka , Nick Campbell & Norio Higuchi , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nakai, M., Singer, H., Sagisaka, Y., Shimodaira, H. (1997). Accent Phrase Segmentation by F₀ Clustering Using Superpositional Modelling. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_22

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2258-3_22
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics