Skip to main content

Accent Phrase Segmentation by F0 Clustering Using Superpositional Modelling

  • Chapter
Computing Prosody
  • 288 Accesses

Abstract

We propose an automatic method for detecting minor phrase boundaries in Japanese continuous speech by using F 0 information. In the training phase, F 0 contours of hand labelled minor phrases are parameterized according to a superpositional model proposed by Fujisaki and Hirose, and assigned to some clusters by a clustering method, in which model parameter of reference templates are calculated as an approximation of each cluster’s centroid. In the segmentation phase, automatic N-best extraction of boundaries is performed by one-stage Dynamic Programming (DP) matching between the reference templates and the target F 0 contour. About 90% of minor phrase boundaries were correctly detected in speaker independent experiments with the ATR Advanced Telecommunications Research Institute International Japanese continuous speech database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Fujisaki and K. Hirose. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust. Soc. Japan (E), 5:233–242, 1984.

    Google Scholar 

  2. H. Fujisaki, K. Hirose, and H. Lei. Prosody and syntax in spoken sentences of Standard Chinese. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, pp. 433–43692, 1992.

    Google Scholar 

  3. H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983.

    Chapter  Google Scholar 

  4. N. Higuchi, T. Hirai, and Y. Sagisaka. Effect of speaking style on parameters of voice fundamental frequency generation model. In Proceedings of the Conference IEICE, Vol. SA-5–3, pp. 488–489, 1994.

    Google Scholar 

  5. T. Hirai, N. Iwahashi, H. Valbert, N. Higuchi, and Y. Sagisaka. Fundamental frequency contour modelling using statistical analysis. In Proceedings of the Acoust. Soc. Jpn. Autumn 93, pp. 225–226, 1993.

    Google Scholar 

  6. A. Komatsu, E. Oohira, and A. Ichikawa. Conversational speech understanding based on sentence structure inference using prosodics, and word spotting. Trans. IEICE, (D), J71-D:1218–1228, 1988.

    Google Scholar 

  7. Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Trans. Commun., COM-28:84–95, 1980.

    Article  Google Scholar 

  8. W. A. Lea, M. F. Medress, and T. E. Skinner. A prosodically guided speech understanding strategy. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-23:30–38, 1975.

    Google Scholar 

  9. H. Ney. The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-32:263–271, 1984.

    Article  Google Scholar 

  10. M. Nakai and H. Shimodaira. Accent phrase segmentation by finding N-best sequences of pitch pattern templates. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 347–350, 1994.

    Google Scholar 

  11. R. Schwartz and Y. L. Chow. The N-best algorithm: an efficient and extract procedure for finding the N most likely sentence hypotheses. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, Vol. S2. 12, pp. 81–84, 1990.

    Article  Google Scholar 

  12. S. Sagayama and S. Furui. A technique for pitch extraction by lag-window method. In Proceedings of the Conference IEICE, 1235, 1978.

    Google Scholar 

  13. H. Shimodaira, M. Kimura, and S. Sagayama. Phrase segmentation of continuous speech by pitch contour DP matching. In Papers of Technical Group on Speech, Vol. SP90-72. IEICE, 1990.

    Google Scholar 

  14. Y. Suzuki, Y. Sekiguchi, and M. Shigenaga. Detection of phrase boundaries using prosodics for continuous speech recognition. Trans. IEICE, (D-II), J72-D-II: 1606–1617, 1989.

    Google Scholar 

  15. Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuwabara. A large-scale Japanese speech database. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, pp. 1089–1092, 1990.

    Google Scholar 

  16. T. Ukita, S. Nakagawa, and T. Sakai. A use of pitch contour in recognizing spoken Japanese arithmetic expressions. Trans. IEICE, (D), J63-D:954–961, 1980.

    Google Scholar 

  17. C. W. Wightman and M. Ostendorf. Automatic recognition of prosodic phrases. In Proceedings of the International Conference on Acoust., Speech, and Signal Processes, pp. 321–324, 1991.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Nakai, M., Singer, H., Sagisaka, Y., Shimodaira, H. (1997). Accent Phrase Segmentation by F0 Clustering Using Superpositional Modelling. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2258-3_22

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7476-6

  • Online ISBN: 978-1-4612-2258-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics