Abstract
Conventional video endoscopy and high-speed video endosc-opy of the human larynx solely provides practitioners with information about the two-dimensional lateral and longitudinal deformation of vocal folds. However, experiments have shown that vibrating human vocal folds have a significant vertical component. Based upon an endoscopic laser projection unit (LPU) connected to a high-speed camera, we propose a fully-automatic and real-time capable approach for the robust 3D reconstruction of human vocal folds. We achieve this by estimating laser ray correspondences by taking epipolar constraints of the LPU into account. Unlike previous approaches only reconstructing the superior area of the vocal folds, our pipeline is based on a parametric reinterpretation of the M5 vocal fold model as a tensor product surface. Not only are we able to generate visually authentic deformations of a dense vibrating vocal fold model, but we are also able to easily generate metric measurements of points of interest on the reconstructed surfaces. Furthermore, we drastically lower the effort needed for visualizing and measuring the dynamics of the human laryngeal area during phonation. Additionally, we publish the first publicly available labeled in-vivo dataset of laser-based high-speed laryngoscopy videos. The source code and dataset are available at henningson.github.io/Vocal3D/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)
Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5 (2014). https://doi.org/10.3389/fpsyg.2014.00760, https://www.frontiersin.org/article/10.3389/fpsyg.2014.00760
Deva Prasad, A., Balu, A., Shah, H., Sarkar, S., Hegde, C., Krishnamurthy, A.: Nurbs-diff: a differentiable programming module for NURBs. Comput.-Aided Des. 146, 103199 (2022)
Döllinger, M., Berry, D.A., Berke, G.S.: Medial surface dynamics of an in vivo canine vocal fold during phonation. J, Acoust. Soc. Am. 117(5), 3174–3183 (2005). https://doi.org/10.1121/1.1871772
Faap, R., Ruben, R.: Redefining the survival of the fittest: communication disorders in the 21st century. Laryngoscope 110, 241–241 (2000). https://doi.org/10.1097/00005537-200002010-00010
Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image (2016)
Fehling, M.K., Grosch, F., Schuster, M.E., Schick, B., Lohscheller, J.: Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional LSTM network. PLoS ONE (2) (2020). https://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-87208-2
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)
Kist, A., Dürr, S., Schützenberger, A., Döllinger, M.: Openhsv: an open platform for laryngeal high-speed videoendoscopy. Sci. Rep. 11 (2021). https://doi.org/10.1038/s41598-021-93149-0
Kist, A., et al.: A deep learning enhanced novel software tool for laryngeal dynamics analysis. J. Speech Language, Hearing Res. 64, 1–15 (2021). https://doi.org/10.1044/2021_JSLHR-20-00498
Koc, T., Çiloglu, T.: Automatic segmentation of high speed video images of vocal folds. J. Appl. Math. 2014 (2014). https://doi.org/10.1155/2014/818415
Luegmair, G., Mehta, D., Kobler, J., Döllinger, M.: Three-dimensional optical reconstruction of vocal fold kinematics using high-speed videomicroscopy with a laser projection system. IEEE Trans. Med. Imaging 34 (2015). https://doi.org/10.1109/TMI.2015.2445921
Merrill, R.M., Roy, N., Lowe, J.: Voice-related symptoms and their effects on quality of life. Ann. Otol. Rhinol. Laryngol. 122(6), 404–411 (2013). https://doi.org/10.1177/000348941312200610, https://doi.org/10.1177/000348941312200610, pMID: 23837394
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Patel, R., Donohue, K., Lau, D., Unnikrishnan, H.: In vivo measurement of pediatric vocal fold motion using structured light laser projection. J. Voice Off. J. Voice Found. 27, 463–472 (2013). https://doi.org/10.1016/j.jvoice.2013.03.004
Piegl, L., Tiller, W.: The NURBS Book. Springer, Berlin (1995)
Schenk, F., Urschler, M., Aigner, C., Roesner, I., Aichinger, P., Bischof, H.: Automatic glottis segmentation from laryngeal high-speed videos using 3d active contours (2014)
Scherer, R.C., Shinwari, D., De Witt, K.J., Zhang, C., Kucinschi, B.R., Afjeh, A.A.: Intraglottal pressure profiles for a symmetric and oblique glottis with a divergence angle of 10 degrees. J. Acoust. Soc. Am. 109(4), 1616–1630 (2001). https://doi.org/10.1121/1.1333420, https://asa.scitation.org/doi/abs/10.1121/1.1333420
Semmler, M., Kniesburges, S., Birk, V., Ziethe, A., Patel, R., Döllinger, M.: 3d reconstruction of human laryngeal dynamics based on endoscopic high-speed recordings. IEEE Trans. Med. Imag. 35(7), 1615–1624 (2016). https://doi.org/10.1109/TMI.2016.2521419
Semmler, M., et al.: Endoscopic laser-based 3d imaging for functional voice diagnostics. Appl. Sci. 7 (2017). https://doi.org/10.3390/app7060600
Snyder, T., Dillow, S.: Digest of education statistics, 2010. nces 2011–015. National Center for Education Statistics (2011)
Sommer, D.E., et al.: Estimation of inferior-superior vocal fold kinematics from high-speed stereo endoscopic data in vivo. J. Acoust. Soc. Am. 136(6), 3290–3300 (2014). https://doi.org/10.1121/1.4900572, https://doi.org/10.1121/1.4900572
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling, pp. 109–116 (2007). https://doi.org/10.1145/1281991.1282006
Stevens Boster, K., Shimamura, R., Imagawa, H., Sakakibara, K.I., Tokuda, I.: Validating stereo-endoscopy with a synthetic vocal fold model. Acta Acustica Unit. Acust. 102, 745–751 (2016). https://doi.org/10.3813/AAA.918990
Tokuda, I., et al.: Reconstructing three-dimensional vocal fold movement via stereo matching. Acoust. Sci. Technol. 34, 374–377 (2013). https://doi.org/10.1250/ast.34.374
Acknowledgement
We thank Florian Güthlein and Bernhard Egger for their valuable feedback. This work was supported by Deutsche Forschungsgemeinschaft (DFG) under grant STA662/6-1 (DFG project number: 448240908).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Henningson, JO., Stamminger, M., Döllinger, M., Semmler, M. (2022). Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-16449-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16448-4
Online ISBN: 978-3-031-16449-1
eBook Packages: Computer ScienceComputer Science (R0)