Skip to main content

Comparison of F0 Control Rules Derived from Multiple Speech Databases

  • Chapter

Abstract

In this paper we describe how computational models of F0 were derived from four different speech corpora and how their control characteristics were compared to find the possibilities of prosody conversion for speech synthesis. A superpositional F0 control model was employed to reduce comptational complexities and a statistical optimization method was used to determine the dominant factors for F0 control in each speech corpus efficiently. The analyses showed the invariance of some dominant control parameters and the differences due to speaking styles. These preliminary results also confirmed the usefulness of superpositional F0 control for prosody conversion.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Abe and H. Sato. Two-stage F0 control model using syllable based F0 units. Proceedings of the International Conference on Acoustics, Speech and Signal Processes, pp. 53–56, 1992.

    Google Scholar 

  2. E. Moulines and Y. Sagisaka. Voice conversion: State of the art and perspectives. Special issue of Speech Communication, 16:125–216, 1995.

    Google Scholar 

  3. H. Fujisaki and K. Hirose. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoustics Soc. J. (E), 5:233–242, 1984.

    Google Scholar 

  4. H. Fujisaki and K. Hirose and N. Takahashi. Manifestation of linguistic and paralinguistic information. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 485–488, 1990.

    Google Scholar 

  5. H. Fujisaki and H. Kawai. Realization of linguistic information in the voice fundamental frequency contour. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 663–666, 1988.

    Google Scholar 

  6. H. Fujisaki and H. Sudo. Synthesis by rule of prosodic features of connected Japanese. Proceedings of 7th ICA, 3:133–136, 1971.

    Google Scholar 

  7. K. Hirose, H. Fujisaki, and H. Kawai. A system for synthesis of connected speech-special emphasis on the prosodic features. Trans, of the Committee on Speech Research, 1985. S85–43 (in Japanese.).

    Google Scholar 

  8. N. Higuchi, T. Hirai, and Y. Sagisaka. Effect of speaking style on parameters of fundamental frequency contour. In J. P. H. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis. New York: Springer-Verlag, 1997.

    Google Scholar 

  9. T. Hirai, N. Iwahashi, N. Higuchi, and Y. Sagisaka. Automatic extraction of F0 control parameters using statistical analysis. In J. P. H. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis. New York: Springer-Verlag, 1997.

    Google Scholar 

  10. N. Iwahashi and Y. Sagisaka. Duration modelling with multiple split regression. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 329–332, 1993.

    Google Scholar 

  11. N. Kaiki and Y. Sagisaka. Optimization of intonation control using statistical F0 resetting characteristics. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, 2:49–52, 1992.

    Google Scholar 

  12. N. Kaiki and Y. Sagisaka. Prosodie characteristics of Japanese conversational speech. Trans. IEICE Jpn., E76-A: 1927–1933, 1993.

    Google Scholar 

  13. N. Kaiki and Y. Sagisaka. Linguistic properties in the control of segmental duration for speech synthesis. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 255–263. Amsterdam: Elsevier Science, 1992.

    Google Scholar 

  14. S. Nakajima and K. Kabeya. Relations between phrase structure and pitch contour. Rec. Spring Meeting, Acoustics Soc. Jpn., Mar. 1984 (in Japanese), pp. 113–114, 1984.

    Google Scholar 

  15. E. Ohira, H. Fujisaki, and K. Hirose. Relationship between articulatory and phonatory controls in the sentence context. Rec. Spring Meeting, Acoustics Soc. Jpn., Mar. (in Japanese.), pp. 111–112, 1984.

    Google Scholar 

  16. M. Riley. Tree-based modelling of segmental durations. In C. Benoît G. Bailly and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 265–274. Amsterdam: Elsevier Science, 1992.

    Google Scholar 

  17. B. G. Secrest and G. R. Doddington. An integrated pitch tracking algorithm for speech system. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 1352–1355, 1983.

    Google Scholar 

  18. Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuwabara. A large-scale Japanese speech database. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, pp. 1089–1092, 1990.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Hirai, T., Higuchi, N., Sagisaka, Y. (1997). Comparison of F0 Control Rules Derived from Multiple Speech Databases. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2258-3_14

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7476-6

  • Online ISBN: 978-1-4612-2258-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics