Skip to main content

Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)

Abstract

Conventional video endoscopy and high-speed video endosc-opy of the human larynx solely provides practitioners with information about the two-dimensional lateral and longitudinal deformation of vocal folds. However, experiments have shown that vibrating human vocal folds have a significant vertical component. Based upon an endoscopic laser projection unit (LPU) connected to a high-speed camera, we propose a fully-automatic and real-time capable approach for the robust 3D reconstruction of human vocal folds. We achieve this by estimating laser ray correspondences by taking epipolar constraints of the LPU into account. Unlike previous approaches only reconstructing the superior area of the vocal folds, our pipeline is based on a parametric reinterpretation of the M5 vocal fold model as a tensor product surface. Not only are we able to generate visually authentic deformations of a dense vibrating vocal fold model, but we are also able to easily generate metric measurements of points of interest on the reconstructed surfaces. Furthermore, we drastically lower the effort needed for visualizing and measuring the dynamics of the human laryngeal area during phonation. Additionally, we publish the first publicly available labeled in-vivo dataset of laser-based high-speed laryngoscopy videos. The source code and dataset are available at henningson.github.io/Vocal3D/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)

    Google Scholar 

  2. Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5 (2014). https://doi.org/10.3389/fpsyg.2014.00760, https://www.frontiersin.org/article/10.3389/fpsyg.2014.00760

  3. Deva Prasad, A., Balu, A., Shah, H., Sarkar, S., Hegde, C., Krishnamurthy, A.: Nurbs-diff: a differentiable programming module for NURBs. Comput.-Aided Des. 146, 103199 (2022)

    Article  MathSciNet  Google Scholar 

  4. Döllinger, M., Berry, D.A., Berke, G.S.: Medial surface dynamics of an in vivo canine vocal fold during phonation. J, Acoust. Soc. Am. 117(5), 3174–3183 (2005). https://doi.org/10.1121/1.1871772

    Article  Google Scholar 

  5. Faap, R., Ruben, R.: Redefining the survival of the fittest: communication disorders in the 21st century. Laryngoscope 110, 241–241 (2000). https://doi.org/10.1097/00005537-200002010-00010

    Article  Google Scholar 

  6. Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image (2016)

    Google Scholar 

  7. Fehling, M.K., Grosch, F., Schuster, M.E., Schick, B., Lohscheller, J.: Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional LSTM network. PLoS ONE (2) (2020). https://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-87208-2

  8. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  9. Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)

    Article  Google Scholar 

  10. Kist, A., Dürr, S., Schützenberger, A., Döllinger, M.: Openhsv: an open platform for laryngeal high-speed videoendoscopy. Sci. Rep. 11 (2021). https://doi.org/10.1038/s41598-021-93149-0

  11. Kist, A., et al.: A deep learning enhanced novel software tool for laryngeal dynamics analysis. J. Speech Language, Hearing Res. 64, 1–15 (2021). https://doi.org/10.1044/2021_JSLHR-20-00498

  12. Koc, T., Çiloglu, T.: Automatic segmentation of high speed video images of vocal folds. J. Appl. Math. 2014 (2014). https://doi.org/10.1155/2014/818415

  13. Luegmair, G., Mehta, D., Kobler, J., Döllinger, M.: Three-dimensional optical reconstruction of vocal fold kinematics using high-speed videomicroscopy with a laser projection system. IEEE Trans. Med. Imaging 34 (2015). https://doi.org/10.1109/TMI.2015.2445921

  14. Merrill, R.M., Roy, N., Lowe, J.: Voice-related symptoms and their effects on quality of life. Ann. Otol. Rhinol. Laryngol. 122(6), 404–411 (2013). https://doi.org/10.1177/000348941312200610, https://doi.org/10.1177/000348941312200610, pMID: 23837394

  15. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  16. Patel, R., Donohue, K., Lau, D., Unnikrishnan, H.: In vivo measurement of pediatric vocal fold motion using structured light laser projection. J. Voice Off. J. Voice Found. 27, 463–472 (2013). https://doi.org/10.1016/j.jvoice.2013.03.004

  17. Piegl, L., Tiller, W.: The NURBS Book. Springer, Berlin (1995)

    Book  Google Scholar 

  18. Schenk, F., Urschler, M., Aigner, C., Roesner, I., Aichinger, P., Bischof, H.: Automatic glottis segmentation from laryngeal high-speed videos using 3d active contours (2014)

    Google Scholar 

  19. Scherer, R.C., Shinwari, D., De Witt, K.J., Zhang, C., Kucinschi, B.R., Afjeh, A.A.: Intraglottal pressure profiles for a symmetric and oblique glottis with a divergence angle of 10 degrees. J. Acoust. Soc. Am. 109(4), 1616–1630 (2001). https://doi.org/10.1121/1.1333420, https://asa.scitation.org/doi/abs/10.1121/1.1333420

  20. Semmler, M., Kniesburges, S., Birk, V., Ziethe, A., Patel, R., Döllinger, M.: 3d reconstruction of human laryngeal dynamics based on endoscopic high-speed recordings. IEEE Trans. Med. Imag. 35(7), 1615–1624 (2016). https://doi.org/10.1109/TMI.2016.2521419

    Article  Google Scholar 

  21. Semmler, M., et al.: Endoscopic laser-based 3d imaging for functional voice diagnostics. Appl. Sci. 7 (2017). https://doi.org/10.3390/app7060600

  22. Snyder, T., Dillow, S.: Digest of education statistics, 2010. nces 2011–015. National Center for Education Statistics (2011)

    Google Scholar 

  23. Sommer, D.E., et al.: Estimation of inferior-superior vocal fold kinematics from high-speed stereo endoscopic data in vivo. J. Acoust. Soc. Am. 136(6), 3290–3300 (2014). https://doi.org/10.1121/1.4900572, https://doi.org/10.1121/1.4900572

  24. Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling, pp. 109–116 (2007). https://doi.org/10.1145/1281991.1282006

  25. Stevens Boster, K., Shimamura, R., Imagawa, H., Sakakibara, K.I., Tokuda, I.: Validating stereo-endoscopy with a synthetic vocal fold model. Acta Acustica Unit. Acust. 102, 745–751 (2016). https://doi.org/10.3813/AAA.918990

    Article  Google Scholar 

  26. Tokuda, I., et al.: Reconstructing three-dimensional vocal fold movement via stereo matching. Acoust. Sci. Technol. 34, 374–377 (2013). https://doi.org/10.1250/ast.34.374

Download references

Acknowledgement

We thank Florian Güthlein and Bernhard Egger for their valuable feedback. This work was supported by Deutsche Forschungsgemeinschaft (DFG) under grant STA662/6-1 (DFG project number: 448240908).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jann-Ole Henningson .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Henningson, JO., Stamminger, M., Döllinger, M., Semmler, M. (2022). Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16449-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16448-4

  • Online ISBN: 978-3-031-16449-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics