Skip to main content
Log in

Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The aim of this paper is to evaluate the effectiveness of using video data for voice source parametrization in the representation of voice production through physical modeling. Laryngeal imaging techniques can be effectively used to obtain vocal fold video sequences and to derive time patterns of relevant glottal cues, such as folds edge position or glottal area. In many physically based numerical models of the vocal folds, these parameters are estimated from the inverse filtered glottal flow waveform, obtained from audio recordings of the sound pressure at lips. However, this model inversion process is often problematic and affected by accuracy and robustness issues. It is here discussed how video analysis of the fold vibration might be effectively coupled to the parametric estimation algorithms based on voice recordings, to improve accuracy and robustness of model inversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Stevens, K.N.: Acoustic Phonetics, Current Studies in Linguistics. The MIT Press, Cambridge (1998)

    Google Scholar 

  2. Ishizaka, K., Flanagan, J.L.: Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Syst. Tech. J. 51(6), 1233–1268 (1972)

    Article  Google Scholar 

  3. Koizumi, T., Taniguchi, S., Hiromitsu, S.: Two-mass models of the vocal cords for natural sounding voice synthesis. J. Acoust. Soc. Am. 82(4), 1179–1192 (1987)

    Article  Google Scholar 

  4. Titze, I.R.: The physics of small-amplitude oscillations of the vocal folds. J. Acoust. Soc. Am. 83(4), 1536–1552 (1988)

    Article  Google Scholar 

  5. Pelorson, X., Hirschberg, A., van Hassel, R.R., Wijnands, A.P.J.: Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two-mass model. J. Acoust. Soc. Am. 96(6), 3416–3431 (1994)

    Article  Google Scholar 

  6. Lucero, J.C.: Dynamics of the two-mass model of the vocal folds: equilibria, bifurcations and oscillation region. J. Acoust. Soc. Am. 94, 3104–3111 (1993)

    Article  Google Scholar 

  7. Ishizaka, K., Isshiki, N.: Computer simulation of pathological vocal-cord vibration. Bell Syst. Tech. J. 60, 1193–1198 (1976)

    Google Scholar 

  8. Scalassara, P.R., Maciel, C.D., Guido, R.C., Pereira, J.C., Fonseca, E.S., Montagnoli, A.N., Júnior, S.B., Vieira, L.S., Sanchez, F.L.: Autoregressive decomposition and pole tracking applied to vocal fold nodule signals. Pattern Recogn. Lett. 28(11), 1360–1367 (2007)

    Article  Google Scholar 

  9. Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)

    Article  Google Scholar 

  10. Funaki, K., Miyanaga, Y., Tochinai, K.: Recursive ARMAX speech analysis based on a glottal source model with phase compensation. Signal Process. 3, 279–295 (1999)

    Article  Google Scholar 

  11. Rao, P., Barman, A.D.: Speech formant frequency estimation: evaluating a nonstationary analysis method. Signal Process. 80(8), 1655–1667 (2000)

    Article  Google Scholar 

  12. Wittenberg, T., Mergell, P., Tigges, M., Eysholdt, U.: Quantitative characterization of functional voice disorders using motion analysis of highspeed video and modeling. In: Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97)-vol. 3, ICASSP’97, pp. 1663–1666 (1997)

  13. Döllinger, M.: The next step in voice assessment: high-speed digital endoscopy and objective evaluation. Curr. Bioinform. 4(2), 101–111 (2009)

    Article  Google Scholar 

  14. Lohscheller, J., Eysholdt, U., Toy, H., Döllinger, M.: Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Trans. Med. Imaging 27(3), 300–309 (2008)

    Article  Google Scholar 

  15. Döllinger, M., Dubrovkiy, D., Patel, R.: Spatiotemporal analysis of vocal fold vibrations between children and adults. Laryngoscope 122(11), 2511–2518 (2012)

    Article  Google Scholar 

  16. Larsson, H., Hertegård, S., Lindestad, P., Hammarberg, B.: Vocal fold vibrations: high-speed imaging, kymography, and acoustic analysis: a preliminary report. Laryngoscope 110(12), 2117–22 (2000)

    Article  Google Scholar 

  17. Drioli, C.: A flow waveform-matched low-dimensional glottal model based on physical knowledge. J. Acoust. Soc. Am. 117(5), 3184–3195 (2005)

    Article  Google Scholar 

  18. Drioli, C., Avanzini, F.: Non-modal voice synthesis by low-dimensional physical models. In: Proceedings of 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) (2003)

  19. Drioli, C., Calanca, A.: Voice processing by dynamic Glottal models with applications to speech enhancement. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pp. 1789–1792 (2011)

  20. Švec, J.G., Schutte, H.K.: Videokymography: high-speed line scanning of vocal fold vibration. J. Voice 10(2), 201–205 (1996)

    Google Scholar 

  21. Qiu, Q., Schutte, H.: A new generation videokymography for routine clinical vocal fold examination. Laryngoscope 116(10), 1824–8 (2006)

    Article  Google Scholar 

  22. Snidaro, L., Foresti, G.L.: Real-time thresholding with euler numbers. Pattern Recogn. Lett. 24(9–10), 1533–1544 (2003)

  23. Foresti, G., Regazzoni, C.: A hierarchical approach to feature extraction and grouping. IEEE Trans. Image Process. 9(6), 1056–1074 (2000)

    Article  Google Scholar 

  24. Maragos, P.A., Schafer, R.W., Butt, M.A. (eds.): Mathematical Morphology and Its Applications to Image and Signal Processing, Computational Imaging and Vision, 3rd edn. Kluwer, Atlanta (1996)

  25. Eviatar, H., Somorjai, R.L.: A fast, simple active contour algorithm for biomedical images. Pattern Recogn. Lett. 17(9), 969–974 (1996)

    Article  Google Scholar 

  26. Backstrom, T., Alku, P., Vilkman, E.: Time-domain parameterization of the closing phase of glottal airflow waveform from voices over a large intensity range. IEEE Trans. Speech Audio Process. 10(3), 186–192 (2002)

    Article  Google Scholar 

Download references

Acknowledgments

We wish to thank Cymo B.V., Groningen, The Netherlands, for kindly providing the acoustic and videokymographic data used in this paper. We also wish to thank the two anonymous reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlo Drioli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Drioli, C., Foresti, G.L. Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data. SIViP 9, 1451–1459 (2015). https://doi.org/10.1007/s11760-013-0597-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-013-0597-0

Keywords

Navigation