Skip to main content

Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

  • 2205 Accesses

Abstract

Weighted Correlation based Atom Decomposition (WCAD) algorithm is a technique for intonation modelling that uses a matching pursuit framework to decompose the F0 contour into a set of basic components, called atoms. The atoms attempt to model the physiological activation of the laryngeal muscles responsible for changes in F0. Recently, WCAD has been upgraded to use the orthogonal matching pursuit (OMP) algorithm, which gives qualitative improvements in the modelling of intonation. A possible exploitation of the OMP based WCAD is the automatic detection of stress in speech, which we undertake for the Hungarian language. Correlation is demonstrated between stress and atomic peaks, as well as between stress and atomic valleys on the previous syllable. The stress detection technique based on WCAD is compared to a baseline system using HMM/GMM stress/phrase models. 7 % improvement is noticed in the F-measure compared to baseline when evaluating on hand-made reference. Finally, we propose a hybrid approach which outperforms both individual systems (by 11 % compared to the baseline).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The WCAD implementation code is available on gitHub at https://github.com/dipteam/wcad.

References

  1. Fujisaki, H.: The roles of physiology, physics and mathematics in modeling prosodic features of speech. In: Speech Prosody, Dresden, Germany, May 2006

    Google Scholar 

  2. Gerazov, B., Gjoreski, A., Ivanovski, Z.: Implementation of optimized matching pursuit techniques in weighted correlation based atom decomposition intonation modelling. In: 3rd International Acoustics and Audio Engineering Conference TAKTONS, Novi Sad, Serbia, pp. 68–69, November 2015

    Google Scholar 

  3. Gerazov, B., Honnet, P.E., Gjoreski, A., Garner, P.N.: Weighted correlation based atom decomposition intonation modelling. In: Proceedings of Interspeech, Dresden, Germany, pp. 1601–1605, September 2015

    Google Scholar 

  4. Gjoreski, A., Gerazov, B., Ivanovski, Z.: Atom-decomposition based analysis for the purpose of emphatic word detection. In: XII International Conference ETAI, Ohrid, Macedonia, September 2015

    Google Scholar 

  5. Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)

    Article  Google Scholar 

  6. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: 1993 Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 40–44. IEEE (1993)

    Google Scholar 

  7. Roach, P.S., et al.: Babel: an eastern european multi-language database. In: International Conference on Speech and Language, pp. 1033–1036 (1996)

    Google Scholar 

  8. Szaszák, G., Beke, A., Olaszy, G., Tóth, B.P.: Gépi beszéd természetességének növelése automatikus, beszédjel alapú hangsúlycímkézö algoritmussal. In: Proceedings of 12th Hungarian Conference on Computational Linguistics (MSZNY), pp. 144–153 (2016)

    Google Scholar 

  9. Szaszák, G., Tulics, M.G., Tündik, M.A.: Analyzing f0 discontinuity for speech prosody enhancement. Acta Univ. Sapientiae Elect. Mech. Eng. 6(1), 59–67 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Hungarian National Innovation Office (OTKA-PD-112598, “Automatic Phonological Phrase and Prosodic Event Detection for the Extraction of Syntactic and Semantic/Pragmatic Information from Speech” and by the Swiss National Science Foundation (No. CRSII2-147611/1, “SP2: SCOPES Project on Speech Prosody”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to György Szaszák .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Szaszák, G., Tündik, M.Á., Gerazov, B., Gjoreski, A. (2016). Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43958-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43957-0

  • Online ISBN: 978-3-319-43958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics