Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy

Henningson, Jann-Ole; Stamminger, Marc; Döllinger, Michael; Semmler, Marion

doi:10.1007/978-3-031-16449-1_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13437))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

6327 Accesses

Abstract

Conventional video endoscopy and high-speed video endosc-opy of the human larynx solely provides practitioners with information about the two-dimensional lateral and longitudinal deformation of vocal folds. However, experiments have shown that vibrating human vocal folds have a significant vertical component. Based upon an endoscopic laser projection unit (LPU) connected to a high-speed camera, we propose a fully-automatic and real-time capable approach for the robust 3D reconstruction of human vocal folds. We achieve this by estimating laser ray correspondences by taking epipolar constraints of the LPU into account. Unlike previous approaches only reconstructing the superior area of the vocal folds, our pipeline is based on a parametric reinterpretation of the M5 vocal fold model as a tensor product surface. Not only are we able to generate visually authentic deformations of a dense vibrating vocal fold model, but we are also able to easily generate metric measurements of points of interest on the reconstructed surfaces. Furthermore, we drastically lower the effort needed for visualizing and measuring the dynamics of the human laryngeal area during phonation. Additionally, we publish the first publicly available labeled in-vivo dataset of laser-based high-speed laryngoscopy videos. The source code and dataset are available at henningson.github.io/Vocal3D/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations

Article 23 April 2024

Joint Segmentation and Sub-pixel Localization in Structured Light Laryngoscopy

Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study

Article Open access 14 October 2021

References

Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)
Google Scholar
Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5 (2014). https://doi.org/10.3389/fpsyg.2014.00760, https://www.frontiersin.org/article/10.3389/fpsyg.2014.00760
Deva Prasad, A., Balu, A., Shah, H., Sarkar, S., Hegde, C., Krishnamurthy, A.: Nurbs-diff: a differentiable programming module for NURBs. Comput.-Aided Des. 146, 103199 (2022)
Article MathSciNet Google Scholar
Döllinger, M., Berry, D.A., Berke, G.S.: Medial surface dynamics of an in vivo canine vocal fold during phonation. J, Acoust. Soc. Am. 117(5), 3174–3183 (2005). https://doi.org/10.1121/1.1871772
Article Google Scholar
Faap, R., Ruben, R.: Redefining the survival of the fittest: communication disorders in the 21st century. Laryngoscope 110, 241–241 (2000). https://doi.org/10.1097/00005537-200002010-00010
Article Google Scholar
Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image (2016)
Google Scholar
Fehling, M.K., Grosch, F., Schuster, M.E., Schick, B., Lohscheller, J.: Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional LSTM network. PLoS ONE (2) (2020). https://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-87208-2
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)
Article Google Scholar
Kist, A., Dürr, S., Schützenberger, A., Döllinger, M.: Openhsv: an open platform for laryngeal high-speed videoendoscopy. Sci. Rep. 11 (2021). https://doi.org/10.1038/s41598-021-93149-0
Kist, A., et al.: A deep learning enhanced novel software tool for laryngeal dynamics analysis. J. Speech Language, Hearing Res. 64, 1–15 (2021). https://doi.org/10.1044/2021_JSLHR-20-00498
Koc, T., Çiloglu, T.: Automatic segmentation of high speed video images of vocal folds. J. Appl. Math. 2014 (2014). https://doi.org/10.1155/2014/818415
Luegmair, G., Mehta, D., Kobler, J., Döllinger, M.: Three-dimensional optical reconstruction of vocal fold kinematics using high-speed videomicroscopy with a laser projection system. IEEE Trans. Med. Imaging 34 (2015). https://doi.org/10.1109/TMI.2015.2445921
Merrill, R.M., Roy, N., Lowe, J.: Voice-related symptoms and their effects on quality of life. Ann. Otol. Rhinol. Laryngol. 122(6), 404–411 (2013). https://doi.org/10.1177/000348941312200610, https://doi.org/10.1177/000348941312200610, pMID: 23837394
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Patel, R., Donohue, K., Lau, D., Unnikrishnan, H.: In vivo measurement of pediatric vocal fold motion using structured light laser projection. J. Voice Off. J. Voice Found. 27, 463–472 (2013). https://doi.org/10.1016/j.jvoice.2013.03.004
Piegl, L., Tiller, W.: The NURBS Book. Springer, Berlin (1995)
Book Google Scholar
Schenk, F., Urschler, M., Aigner, C., Roesner, I., Aichinger, P., Bischof, H.: Automatic glottis segmentation from laryngeal high-speed videos using 3d active contours (2014)
Google Scholar
Scherer, R.C., Shinwari, D., De Witt, K.J., Zhang, C., Kucinschi, B.R., Afjeh, A.A.: Intraglottal pressure profiles for a symmetric and oblique glottis with a divergence angle of 10 degrees. J. Acoust. Soc. Am. 109(4), 1616–1630 (2001). https://doi.org/10.1121/1.1333420, https://asa.scitation.org/doi/abs/10.1121/1.1333420
Semmler, M., Kniesburges, S., Birk, V., Ziethe, A., Patel, R., Döllinger, M.: 3d reconstruction of human laryngeal dynamics based on endoscopic high-speed recordings. IEEE Trans. Med. Imag. 35(7), 1615–1624 (2016). https://doi.org/10.1109/TMI.2016.2521419
Article Google Scholar
Semmler, M., et al.: Endoscopic laser-based 3d imaging for functional voice diagnostics. Appl. Sci. 7 (2017). https://doi.org/10.3390/app7060600
Snyder, T., Dillow, S.: Digest of education statistics, 2010. nces 2011–015. National Center for Education Statistics (2011)
Google Scholar
Sommer, D.E., et al.: Estimation of inferior-superior vocal fold kinematics from high-speed stereo endoscopic data in vivo. J. Acoust. Soc. Am. 136(6), 3290–3300 (2014). https://doi.org/10.1121/1.4900572, https://doi.org/10.1121/1.4900572
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling, pp. 109–116 (2007). https://doi.org/10.1145/1281991.1282006
Stevens Boster, K., Shimamura, R., Imagawa, H., Sakakibara, K.I., Tokuda, I.: Validating stereo-endoscopy with a synthetic vocal fold model. Acta Acustica Unit. Acust. 102, 745–751 (2016). https://doi.org/10.3813/AAA.918990
Article Google Scholar
Tokuda, I., et al.: Reconstructing three-dimensional vocal fold movement via stereo matching. Acoust. Sci. Technol. 34, 374–377 (2013). https://doi.org/10.1250/ast.34.374

Download references

Acknowledgement

We thank Florian Güthlein and Bernhard Egger for their valuable feedback. This work was supported by Deutsche Forschungsgemeinschaft (DFG) under grant STA662/6-1 (DFG project number: 448240908).

Author information

Authors and Affiliations

Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
Jann-Ole Henningson & Marc Stamminger
Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nuremberg, 91054, Erlangen, Germany
Michael Döllinger & Marion Semmler

Authors

Jann-Ole Henningson
View author publications
You can also search for this author in PubMed Google Scholar
Marc Stamminger
View author publications
You can also search for this author in PubMed Google Scholar
Michael Döllinger
View author publications
You can also search for this author in PubMed Google Scholar
Marion Semmler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jann-Ole Henningson .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 5183 KB)

Supplementary material 2 (mp4 2411 KB)

Supplementary material 3 (mp4 5332 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Henningson, JO., Stamminger, M., Döllinger, M., Semmler, M. (2022). Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-16449-1_1
Published: 17 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16448-4
Online ISBN: 978-3-031-16449-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy