Abstract
We propose an automatic method for detecting minor phrase boundaries in Japanese continuous speech by using F 0 information. In the training phase, F 0 contours of hand labelled minor phrases are parameterized according to a superpositional model proposed by Fujisaki and Hirose, and assigned to some clusters by a clustering method, in which model parameter of reference templates are calculated as an approximation of each cluster’s centroid. In the segmentation phase, automatic N-best extraction of boundaries is performed by one-stage Dynamic Programming (DP) matching between the reference templates and the target F 0 contour. About 90% of minor phrase boundaries were correctly detected in speaker independent experiments with the ATR Advanced Telecommunications Research Institute International Japanese continuous speech database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Fujisaki and K. Hirose. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust. Soc. Japan (E), 5:233–242, 1984.
H. Fujisaki, K. Hirose, and H. Lei. Prosody and syntax in spoken sentences of Standard Chinese. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, pp. 433–43692, 1992.
H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983.
N. Higuchi, T. Hirai, and Y. Sagisaka. Effect of speaking style on parameters of voice fundamental frequency generation model. In Proceedings of the Conference IEICE, Vol. SA-5–3, pp. 488–489, 1994.
T. Hirai, N. Iwahashi, H. Valbert, N. Higuchi, and Y. Sagisaka. Fundamental frequency contour modelling using statistical analysis. In Proceedings of the Acoust. Soc. Jpn. Autumn 93, pp. 225–226, 1993.
A. Komatsu, E. Oohira, and A. Ichikawa. Conversational speech understanding based on sentence structure inference using prosodics, and word spotting. Trans. IEICE, (D), J71-D:1218–1228, 1988.
Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Trans. Commun., COM-28:84–95, 1980.
W. A. Lea, M. F. Medress, and T. E. Skinner. A prosodically guided speech understanding strategy. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-23:30–38, 1975.
H. Ney. The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-32:263–271, 1984.
M. Nakai and H. Shimodaira. Accent phrase segmentation by finding N-best sequences of pitch pattern templates. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 347–350, 1994.
R. Schwartz and Y. L. Chow. The N-best algorithm: an efficient and extract procedure for finding the N most likely sentence hypotheses. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, Vol. S2. 12, pp. 81–84, 1990.
S. Sagayama and S. Furui. A technique for pitch extraction by lag-window method. In Proceedings of the Conference IEICE, 1235, 1978.
H. Shimodaira, M. Kimura, and S. Sagayama. Phrase segmentation of continuous speech by pitch contour DP matching. In Papers of Technical Group on Speech, Vol. SP90-72. IEICE, 1990.
Y. Suzuki, Y. Sekiguchi, and M. Shigenaga. Detection of phrase boundaries using prosodics for continuous speech recognition. Trans. IEICE, (D-II), J72-D-II: 1606–1617, 1989.
Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuwabara. A large-scale Japanese speech database. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, pp. 1089–1092, 1990.
T. Ukita, S. Nakagawa, and T. Sakai. A use of pitch contour in recognizing spoken Japanese arithmetic expressions. Trans. IEICE, (D), J63-D:954–961, 1980.
C. W. Wightman and M. Ostendorf. Automatic recognition of prosodic phrases. In Proceedings of the International Conference on Acoust., Speech, and Signal Processes, pp. 321–324, 1991.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Nakai, M., Singer, H., Sagisaka, Y., Shimodaira, H. (1997). Accent Phrase Segmentation by F0 Clustering Using Superpositional Modelling. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_22
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2258-3_22
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive