Skip to main content

A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English

  • Conference paper
  • First Online:
Book cover Speech and Computer (SPECOM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

  • 2218 Accesses

Abstract

One of the most recently proposed techniques for modeling the prosody of an utterance is the decomposition of its pitch, duration and/or energy contour into physiologically motivated units called atoms, based on matching pursuit. Since this model is based on the physiology of the production of sentence intonation, it is essentially language independent. However, the intonation of an utterance in a particular language is obviously under the influence of factors of a predominantly linguistic nature. In this research, restricted to the case of American English with prosody annotated using standard ToBI conventions, we have shown that, under certain mild constraints, the positive and negative atoms identified in the pitch contour coincide very well with high and low pitch accents and phrase accents of ToBI. By giving a linguistic interpretation of the atom decomposition model, this research enables its practical use in domains such as speech synthesis or cross-lingual prosody transfer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Six recordings from the initial set of 910 recordings were excluded because they contained the ERROR tag where a boundary tone was expected.

References

  1. Fujisaki, H., Nagashima, S.: A model for the synthesis of pitch contours of connected speech. Technical Report, Engineering Research Institute. University of Tokyo, Japan (1969)

    Google Scholar 

  2. Strik, H.: Physiological control and behaviour of the voice source in the production of prosody, Ph.D. thesis, Department of Language and Speech, University of Nijmegen, Netherlands (1994)

    Google Scholar 

  3. Kochanski, G.P., Shih, C.: Stem-ML: Language independent prosody description. In: International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 239–242 (2000)

    Google Scholar 

  4. Honnet, P.-E., Gerazov, B., Garner, P.N.: Atom decomposition-based intonation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP (2015)

    Google Scholar 

  5. Gerazov, B., Honnet, P-E., Gjoreski, A., Garner, P.: Weighted correlation based atom decomposition intonation modeling. In: INTERSPEECH (2015)

    Google Scholar 

  6. Pierrehumbert, J.B.: The phonetics and phonology of English intonation (Ph.D. thesis). MIT, Cambridge, MA, USA (1980)

    Google Scholar 

  7. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierre-humbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 867–870 (1992)

    Google Scholar 

  8. Taylor, P.: Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3), 1697–1714 (2000)

    Article  Google Scholar 

  9. Aubergé, V.: Prosody modeling with a dynamic lexicon of intonative forms: Application for text-to-speech synthesis. In: Proceedings of the ESCA Workshop on Prosody, pp. 62–65 (1993)

    Google Scholar 

  10. Holm, B,. Bailly G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 203–206 (2000)

    Google Scholar 

  11. Kohler, K.J.: Studies in German intonation, Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung. Universität Kiel, vol. 25, 295–360 (1991)

    Google Scholar 

  12. Kohler, K.J.: Parametric control of prosodic variables by symbolic input in TTS synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 459–475. Springer, New York (1997)

    Chapter  Google Scholar 

  13. Beckman, M.E., Hirschberg, J., Shattuck-Hufnagel, S.: The original ToBI system and the evolution of the ToBI framework. In: Jun, S.-A. (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 9–54. Oxford University Press, UK (2005)

    Chapter  Google Scholar 

  14. Ostendorf, M., Price, P., Shattuck-Hufnagel, S.: The Boston University Radio News Corpus. Linguistic Data Consortium (1995)

    Google Scholar 

  15. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)

    Article  MATH  Google Scholar 

  16. Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)

    Article  Google Scholar 

  17. Öhman, S.: Word and sentence intonation: A quantitative model. Speech Transmission Laboratory, Department of Speech Communication, Royal Institute of Technology (1967)

    Google Scholar 

  18. Prom-on, S., Xu, Y., Thipakorn, B.: Modeling tone and intonation in Mandarin and English as a process of target approximation. J. Acoust. Soc. Am. 125, 405–424 (2009)

    Article  Google Scholar 

  19. Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: ICASSP 2000, vol. 3, pp. 1281–1284 (2000)

    Google Scholar 

Download references

Acknowledgments

The presented study was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia (grant TR32035), and was carried out within the SCOPES project “SP2: SCOPES Project for Speech Prosody” (No. CRSII2-147611/1), supported by Swiss National Science Foundation. The authors are grateful to the company Speech Morphing, Inc. from Campbell, CA, USA, for providing the speech corpus used in the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milan Sečujski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Delić, T., Gerazov, B., Popović, B., Sečujski, M. (2016). A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43958-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43957-0

  • Online ISBN: 978-3-319-43958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics