Skip to main content

The Phonetic Grounding of Prosody: Analysis and Visualisation Tools

  • Conference paper
  • First Online:
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13212))

Included in the following conference series:

  • 348 Accesses

Abstract

A suite of related online and offline analysis and visualisation tools for training students of phonetics in the acoustics of prosody is described in detail. Prosody is informally understood as the rhythms and melodies of speech, whether relating to words, sentences, or longer stretches of discourse, including dialogue. The aim is to contribute towards bridging the epistemological gap between phonological analysis, based on the linguist’s intuition together with structural models, on the one hand, and, on the other hand, phonetic analysis based on measurements and physical models of the production, transmission (acoustic) and perception phases of the speech chain. The toolkit described in the present contribution applies to the acoustic domain, with analysis of the low frequency (LF) amplitude modulation (AM) and frequency modulation (FM) of speech, with spectral analyses of the demodulated amplitude and frequency envelopes, in each case as LF spectrum and LF spectrogram. Clustering functions permit comparison of utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wwwhomes.uni-bielefeld.de/gibbon/CRAFT/; code accessible on GitHub.

References

  1. Arjmandi, M.K., Dilley, L.C., Lehet, M.: A comprehensive framework for F0 estimation and sampling in modeling prosodic variation in infant-directed speech. In: Proceedings of the 6th International Symposium on Tonal Aspects of Language, Berlin, Germany (2018)

    Google Scholar 

  2. Barbosa, P.A.: Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. Speech Prosody 2002, 163–166 (2002)

    Google Scholar 

  3. Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001)

    Google Scholar 

  4. Camacho, A.: SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music. Ph.D. thesis, University of Florida (2007)

    Google Scholar 

  5. Cummins, F., Port, R.: Rhythmic constraints on stress timing in English. J. Phon. 1998(26), 145–171 (1998)

    Article  Google Scholar 

  6. De Cheveigné, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  7. Garg, D.: SWIPE pitch estimator (2018). https://github.com/dishagarg/SWIPE. [PySWIPE]

  8. Gaspari, D.: Mandarin Tone Trainer. Master’s thesis, Harvard Extension School (2016). https://github.com/dgaspari/pyrapt

  9. Gibbon, D.: The future of prosody: It’s about time. Proc. Speech Prosody 9 (2018). https://www.isca-speech.org/archive/SpeechProsody_2018/pdfs/_Inv-1.pdf

  10. Gibbon, D.: Rhythm zone theory: speech rhythms are physical after all. In: Wrembel, M., Kiełkiewicz-Janowiak, A., Gąsiorowski, P. (eds.) Approaches to the Study of Sound Structure and Speech. Interdisciplinary Work in Honour of Katarzyna Dziubalska-Kołaczyk. Routledge, London (2019). https://arxiv.org/abs/1902.01267

  11. Gibbon, D.: CRAFT: A Multifunction Online Platform for Speech Prosody Visualisation (2019). https://arxiv.org/pdf/1903.08718.pdf

  12. Gibbon, D.: The rhythms of rhythm. J. Int. Phonetic Assoc. First View 1–33 (2021). https://doi.org/10.1017/S0025100321000086

  13. Guyot, P.: Fast Python implementation of the Yin algorithm (2018). https://github.com/patriceguyot/Yin/

  14. Hermansky, H.: History of modulation spectrum in ASR. In: Proceedings of the ICASSP 2010 (2010)

    Google Scholar 

  15. Hess, W.: Pitch Determination of Speech Signals: Algorithms and Devices. Springer, Berlin (1983). https://doi.org/10.1007/978-3-642-81926-1

    Book  Google Scholar 

  16. Inden, B., Malisz, Z., Wagner, P., Wachsmuth, I.: Rapid entrainment to spontaneous speech: a comparison of oscillator models. In: Miyake, N., Peebles, D., Cooper, R.P. (eds.) Proceedings of the 34th Annual Conference of the Cognitive Science Society. Cognitive Science Society, Austin (2012)

    Google Scholar 

  17. Jouvet, D., Laprie, Y.: Performance analysis of several pitch detection algorithms on simulated and real noisy speech data. In: 25th European Signal Processing Conference (2017)

    Google Scholar 

  18. Klessa, K.: Annotation Pro. Enhancing Analyses of Linguistic and Paralinguistic Features in Speech. Wydział Neofilologii UAM, Poznań (2016)

    Google Scholar 

  19. Martin, P.: WinPitch: un logiciel d’analyse temps réel de la fréquence fondamentale fonctionnant sous Windows. Actes des XXIV Journées d’Étude sur la Parole, Avignon 224–227 (1996)

    Google Scholar 

  20. Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., McGonegal, C.A.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Sig. Process. ASSP-24(5), 399–418 (1976)

    Google Scholar 

  21. Schmitt, B.J.B.: AMFM_decompy (2014). [PyYAAPT]. https://github.com/bjbschmitt/AMFM_decompy

  22. Sjölander, K., Beskow, J.: Wavesurfer – an open source speech tool. Proc. Interspeech 464–467 (2000). http://www.speech.kth.se/wavesurfer/

  23. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Palatal, K.K. (eds.) Speech Coding and Synthesis, pp. 497–518. Elsevier Science B.V. (1995)

    Google Scholar 

  24. Talkin, D.: Reaper: Robust Epoch And Pitch EstimatoR (2014). https://github.com/google/REAPER

  25. Tilsen, S., Johnson, K.: Low-frequency Fourier analysis of speech rhythm. J. Acoust. Soc. Am. 124(2), EL34–EL39 (2008). [PubMed: 18681499]

    Google Scholar 

  26. Tilsen, S., Arvaniti, A.: Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. J. Acoust. Soc. Am. 134, 628 (2013)

    Article  Google Scholar 

  27. Todd, N.P.M., Brown, G.J.: A computational model of prosody perception. ICSLP 94, 127–130 (1994)

    Article  Google Scholar 

  28. TraunmĂ¼ller, H.: Conventional, biological, and environmental factors in speech communication: a modulation theory. In Dufberg, M., Engstrand, O. (eds.) PERILUS XVIII: Experiments in Speech Process, pp. 1–19. Department of Linguistics, Stockholm University, Stockholm (1994). [Also in Phonetica 51, 170–183 (1994)]

    Google Scholar 

  29. Xu, Y.: ProsodyPro – a tool for large-scale systematic prosody analysis. In: Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence, France, pp. 7–10 (2013)

    Google Scholar 

  30. Zahorian, S.A., Hu, H.: A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6), 4559–4571 (2008). [YAAPT]

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dafydd Gibbon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gibbon, D. (2022). The Phonetic Grounding of Prosody: Analysis and Visualisation Tools. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science(), vol 13212. Springer, Cham. https://doi.org/10.1007/978-3-031-05328-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05328-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05327-6

  • Online ISBN: 978-3-031-05328-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics