Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data

Drioli, Carlo; Foresti, Gian Luca

doi:10.1007/s11760-013-0597-0

Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data

Original Paper
Published: 07 January 2014

Volume 9, pages 1451–1459, (2015)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Carlo Drioli¹ &
Gian Luca Foresti¹

213 Accesses
6 Citations
Explore all metrics

Abstract

The aim of this paper is to evaluate the effectiveness of using video data for voice source parametrization in the representation of voice production through physical modeling. Laryngeal imaging techniques can be effectively used to obtain vocal fold video sequences and to derive time patterns of relevant glottal cues, such as folds edge position or glottal area. In many physically based numerical models of the vocal folds, these parameters are estimated from the inverse filtered glottal flow waveform, obtained from audio recordings of the sound pressure at lips. However, this model inversion process is often problematic and affected by accuracy and robustness issues. It is here discussed how video analysis of the fold vibration might be effectively coupled to the parametric estimation algorithms based on voice recordings, to improve accuracy and robustness of model inversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework

Article 11 December 2017

Voice production model based on phonation biophysics

Article Open access 08 September 2021

Nonlinear Acoustic Analysis of Voice Production

References

Stevens, K.N.: Acoustic Phonetics, Current Studies in Linguistics. The MIT Press, Cambridge (1998)
Google Scholar
Ishizaka, K., Flanagan, J.L.: Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Syst. Tech. J. 51(6), 1233–1268 (1972)
Article Google Scholar
Koizumi, T., Taniguchi, S., Hiromitsu, S.: Two-mass models of the vocal cords for natural sounding voice synthesis. J. Acoust. Soc. Am. 82(4), 1179–1192 (1987)
Article Google Scholar
Titze, I.R.: The physics of small-amplitude oscillations of the vocal folds. J. Acoust. Soc. Am. 83(4), 1536–1552 (1988)
Article Google Scholar
Pelorson, X., Hirschberg, A., van Hassel, R.R., Wijnands, A.P.J.: Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two-mass model. J. Acoust. Soc. Am. 96(6), 3416–3431 (1994)
Article Google Scholar
Lucero, J.C.: Dynamics of the two-mass model of the vocal folds: equilibria, bifurcations and oscillation region. J. Acoust. Soc. Am. 94, 3104–3111 (1993)
Article Google Scholar
Ishizaka, K., Isshiki, N.: Computer simulation of pathological vocal-cord vibration. Bell Syst. Tech. J. 60, 1193–1198 (1976)
Google Scholar
Scalassara, P.R., Maciel, C.D., Guido, R.C., Pereira, J.C., Fonseca, E.S., Montagnoli, A.N., Júnior, S.B., Vieira, L.S., Sanchez, F.L.: Autoregressive decomposition and pole tracking applied to vocal fold nodule signals. Pattern Recogn. Lett. 28(11), 1360–1367 (2007)
Article Google Scholar
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)
Article Google Scholar
Funaki, K., Miyanaga, Y., Tochinai, K.: Recursive ARMAX speech analysis based on a glottal source model with phase compensation. Signal Process. 3, 279–295 (1999)
Article Google Scholar
Rao, P., Barman, A.D.: Speech formant frequency estimation: evaluating a nonstationary analysis method. Signal Process. 80(8), 1655–1667 (2000)
Article Google Scholar
Wittenberg, T., Mergell, P., Tigges, M., Eysholdt, U.: Quantitative characterization of functional voice disorders using motion analysis of highspeed video and modeling. In: Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97)-vol. 3, ICASSP’97, pp. 1663–1666 (1997)
Döllinger, M.: The next step in voice assessment: high-speed digital endoscopy and objective evaluation. Curr. Bioinform. 4(2), 101–111 (2009)
Article Google Scholar
Lohscheller, J., Eysholdt, U., Toy, H., Döllinger, M.: Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Trans. Med. Imaging 27(3), 300–309 (2008)
Article Google Scholar
Döllinger, M., Dubrovkiy, D., Patel, R.: Spatiotemporal analysis of vocal fold vibrations between children and adults. Laryngoscope 122(11), 2511–2518 (2012)
Article Google Scholar
Larsson, H., Hertegård, S., Lindestad, P., Hammarberg, B.: Vocal fold vibrations: high-speed imaging, kymography, and acoustic analysis: a preliminary report. Laryngoscope 110(12), 2117–22 (2000)
Article Google Scholar
Drioli, C.: A flow waveform-matched low-dimensional glottal model based on physical knowledge. J. Acoust. Soc. Am. 117(5), 3184–3195 (2005)
Article Google Scholar
Drioli, C., Avanzini, F.: Non-modal voice synthesis by low-dimensional physical models. In: Proceedings of 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) (2003)
Drioli, C., Calanca, A.: Voice processing by dynamic Glottal models with applications to speech enhancement. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pp. 1789–1792 (2011)
Švec, J.G., Schutte, H.K.: Videokymography: high-speed line scanning of vocal fold vibration. J. Voice 10(2), 201–205 (1996)
Google Scholar
Qiu, Q., Schutte, H.: A new generation videokymography for routine clinical vocal fold examination. Laryngoscope 116(10), 1824–8 (2006)
Article Google Scholar
Snidaro, L., Foresti, G.L.: Real-time thresholding with euler numbers. Pattern Recogn. Lett. 24(9–10), 1533–1544 (2003)
Foresti, G., Regazzoni, C.: A hierarchical approach to feature extraction and grouping. IEEE Trans. Image Process. 9(6), 1056–1074 (2000)
Article Google Scholar
Maragos, P.A., Schafer, R.W., Butt, M.A. (eds.): Mathematical Morphology and Its Applications to Image and Signal Processing, Computational Imaging and Vision, 3rd edn. Kluwer, Atlanta (1996)
Eviatar, H., Somorjai, R.L.: A fast, simple active contour algorithm for biomedical images. Pattern Recogn. Lett. 17(9), 969–974 (1996)
Article Google Scholar
Backstrom, T., Alku, P., Vilkman, E.: Time-domain parameterization of the closing phase of glottal airflow waveform from voices over a large intensity range. IEEE Trans. Speech Audio Process. 10(3), 186–192 (2002)
Article Google Scholar

Download references

Acknowledgments

We wish to thank Cymo B.V., Groningen, The Netherlands, for kindly providing the acoustic and videokymographic data used in this paper. We also wish to thank the two anonymous reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Udine, Via delle Scienze 206, 33100 , Udine, Italy
Carlo Drioli & Gian Luca Foresti

Authors

Carlo Drioli
View author publications
You can also search for this author in PubMed Google Scholar
Gian Luca Foresti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlo Drioli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Drioli, C., Foresti, G.L. Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data. SIViP 9, 1451–1459 (2015). https://doi.org/10.1007/s11760-013-0597-0

Download citation

Received: 17 July 2013
Revised: 29 November 2013
Accepted: 10 December 2013
Published: 07 January 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11760-013-0597-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data

Abstract

Access this article

Similar content being viewed by others

Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework

Voice production model based on phonation biophysics

Nonlinear Acoustic Analysis of Voice Production

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data

Abstract

Access this article

Similar content being viewed by others

Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework

Voice production model based on phonation biophysics

Nonlinear Acoustic Analysis of Voice Production

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation