Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network

Gómez, Pablo; Semmler, Marion; Schützenberger, Anne; Bohr, Christopher; Döllinger, Michael

doi:10.1007/s11517-019-01965-4

Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network

Original Article
Published: 21 March 2019

Volume 57, pages 1451–1463, (2019)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Pablo Gómez ORCID: orcid.org/0000-0002-5631-8240¹,
Marion Semmler¹,
Anne Schützenberger¹,
Christopher Bohr² &
…
Michael Döllinger¹

1353 Accesses
39 Citations
Explore all metrics

Abstract

Laryngeal endoscopy is one of the primary diagnostic tools for laryngeal disorders. The main techniques are videostroboscopy and lately high-speed video endoscopy. Unfortunately, due to the restricting anatomy of the larynx and technical limitations of the recording equipment, many videos suffer from insufficient illumination, which complicates clinical examination and analysis. This work presents an approach to enhance low-light images from high-speed video endoscopy using a convolutional neural network. We introduce a new technique to generate realistically darkened training samples using Perlin noise. Extensive data augmentation is employed to cope with the limited training data allowing training with just 55 videos. The approach is compared against four state-of-the-art low-light enhancement methods and statistically significantly outperforms each on a no-reference (NIQE) and two full-reference (PSNR, SSIM) image quality metrics. The presented approach can be run on consumer-grade hardware and is thereby directly applicable in a clinical context. It is likely transferable to similar techniques such as videostroboscopy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic framework for endoscopic image restoration and enhancement

Article 22 October 2020

CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy

Towards a Fast and Safe LED-Based Photoacoustic Imaging Using Deep Convolutional Neural Network

Notes

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow IJ, Harp A, Irving G, Isard M, Jia Y, Józefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray DG, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker PA, Vanhoucke V, Vasudevan V, Viégas FB, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:160304467
Andrade-Miranda G, Godino-Llorente JI (2017) Glottal gap tracking by a continuous background modeling using inpainting. Med Biol Eng Comput 55(12):2123–2141
Article Google Scholar
Arici T, Dikbas S, Altunbasak Y (2009) A histogram modification framework and its application for image contrast enhancement. IEEE Trans Image Proc 18(9):1921–1935
Article Google Scholar
Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C et al (2018) Redefine statistical significance. Nat Hum Behav 2(1):6
Article Google Scholar
Benninger MS, Holy CE, Bryson PC, Milstein CF (2017) Prevalence and occupation of patients presenting with dysphonia in the United States. J Voice 31(5):594–600
Article Google Scholar
Bhattacharyya N (2014) The prevalence of voice problems among adults in the United States. Laryngoscope 124(10):2359–2362
Article Google Scholar
Blau Y, Michaeli T (2017) The perception-distortion tradeoff. arXiv:171106077
Celik T, Tjahjadi T (2011) Contextual and variational contrast enhancement. IEEE Trans Image Proc 20(12):3431–3441
Article Google Scholar
Chen C, Chen Q, Xu J, Koltun V (2018) Learning to see in the dark. arXiv:180501934
Cohen SM, Kim J, Roy N, Asche C, Courey M (2012) Direct health care costs of laryngeal diseases and disorders. Laryngoscope 122(7):1582–1588
Article Google Scholar
Cutler JL, Cleveland T (2002) The clinical usefulness of laryngeal videostroboscopy and the role of high-speed cinematography in laryngeal evaluation. Curr Opin Otolaryngo 10(6):462–466
Google Scholar
Deliyski DD, Petrushev PP, Bonilha HS, Gerlach TT, Martin-Harris B, Hillman RE (2008) Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution. Folia Phoniatr Logo 60(1):33–44
Article Google Scholar
Döllinger M (2009) The next step in voice assessment: high-speed digital endoscopy and objective evaluation. Curr Bioinform 4(2):101–111
Article Google Scholar
Döllinger M, Dubrovskiy D, Patel R (2012) Spatiotemporal analysis of vocal fold vibrations between children and adults. Laryngoscope 122(11):2511–2518
Article Google Scholar
Dong X, Wang G, Pang Y, Li W, Wen J, Meng W, Lu Y (2011) Fast efficient algorithm for enhancement of low lighting video. In: IEEE Int Conf Multimedia Expo (ICME), pp 1–6
Fu X, Zeng D, Huang Y, Zhang XP, Ding X (2016) A weighted variational model for simultaneous reflectance and illumination estimation. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR), pp 2782–2790
Gloger O, Lehnert B, Schrade A, Völzke H (2015) Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions. IEEE Trans Biomed Eng 62(3):795–806
Article Google Scholar
Greenspan H, Van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159
Article Google Scholar
Guo X, Li Y, Ling H (2017) LIME: Low-Light image enhancement via illumination map estimation. IEEE Trans Image Proc 26(2):982–993
Article Google Scholar
Hore A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: Int conf pattern recognit (ICPR), IEEE, pp 2366–2369
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Int Conf Mach Learn (ICML), pp 448–456
Jin KH, McCann MT, Froustey E, Unser M (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Trans Image Proc 26(9):4509–4522
Article Google Scholar
Kendall KA (2012) High-speed digital imaging of the larynx: recent advances. Curr Opin Otolaryngo 20(6):466–471
Google Scholar
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Adv Neur Inf Proc Sys (NIPS), pp 971–980
Lagae A, Lefebvre S, Cook R, DeRose T, Drettakis G, Ebert DS, Lewis JP, Perlin K, Zwicker M (2010) A survey of procedural noise functions. In: Comput graph forum, wiley online library, vol 29. pp 2579–2600
Land EH, McCann JJ (1971) Lightness and retinex theory. J Opt Soc Am 61(1):1–11
Article CAS Google Scholar
Lee C, Lee C, Kim CS (2013) Contrast enhancement based on layered difference representation of 2D histograms. IEEE Trans Image Proc 22(12):5372–5384
Article Google Scholar
Lee JS, Kim E, Sung MW, Kim KH, Sung MY, Park KS (2001) A method for assessing the regional vibratory pattern of vocal folds by analysing the video recording of stroboscopy. Med Biol Eng Comput 39(3):273–278
Article CAS Google Scholar
Li C, Guo J, Porikli F, Pang Y (2018) Lightennet: a convolutional neural network for weakly illuminated image enhancement. Pattern Recognit Lett 104:15–22
Article Google Scholar
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Article Google Scholar
Lohscheller J, Toy H, Rosanowski F, Eysholdt U, Döllinger M (2007) Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. Med Image Anal 11(4):400–413
Article Google Scholar
Lore KG, Akintayo A, Sarkar S (2017) LLNEt: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognit 61:650–662
Article Google Scholar
Mehta DD, Zañartu M, Quatieri TF, Deliyski DD, Hillman RE (2011) Investigating acoustic correlates of human vocal fold vibratory phase asymmetry through modeling and laryngeal high-speed videoendoscopy. J Acoust Soc Am 130(6):3999–4009
Article Google Scholar
Mittal A, Soundararajan R, Bovik AC (2013) Making a “Completely Blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Article Google Scholar
Odena A, Dumoulin V, Olah C (2016) Deconvolution and checkerboard artifacts. Distill, https://doi.org/10.23915/distill.00003. http://distill.pub/2016/deconv-checkerboard
Patel R, Dailey S, Bless D (2008) Comparison of high-speed digital imaging with stroboscopy for laryngeal imaging of glottal disorders. Ana Oto Rhinolo Laryng 117(6):413–424
Article Google Scholar
Perlin K (1985) An image synthesizer. ACM Siggraph Comp Graph 19(3):287–296
Article Google Scholar
Rasp O, Lohscheller J, Döllinger M, Eysholdt U, Hoppe U (2006) The pitch rise paradigm: a new task for real-time endoscopy of non-stationary phonation. Folia Phoniatr Logo 58(3):175– 185
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for biomedical image segmentation. In: Int conf med image comp comp-ass interv (MICCAI), Springer, pp 234– 241
Roy N, Barkmeier-Kraemer J, Eadie T, Sivasankar MP, Mehta D, Paul D, Hillman R (2013) Evidence-based clinical voice assessment: a systematic review. Am J Speech-Lang Pat 22(2):212–226
Article Google Scholar
Semmler M, Kniesburges S, Birk V, Ziethe A, Patel R, Döllinger M (2016) 3D reconstruction of human laryngeal dynamics based on endoscopic high-speed recordings. IEEE Trans Med Imaging 35(7):1615–1624
Article Google Scholar
Shen L, Yue Z, Feng F, Chen Q, Liu S, Ma J (2017) MSR-net: Low-light image enhancement using deep convolutional network. arXiv:171102488
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Article Google Scholar
Sommer DE, Tokuda IT, Peterson SD, Sakakibara KI, Imagawa H, Yamauchi A, Nito T, Yamasoba T, Tayama N (2014) Estimation of inferior-superior vocal fold kinematics from high-speed stereo endoscopic data in vivo. J Acoust Soc Am 136(6):3290– 3300
Article Google Scholar
Švec JG, Schutte HK (1996) Videokymography: high-speed line scanning of vocal fold vibration. J Voice 10(2):201–205
Article Google Scholar
Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35 (5):1299–1312
Article Google Scholar
Tao L, Zhu C, Xiang G, Li Y, Jia H, Xie X (2017) LLCNN: A convolutional neural network for low-light image enhancement. In: IEEE Vis Comm Image Proc (VCIP), pp 1–4
Wang S, Zheng J, Hu HM, Li B (2013) Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Trans Image Proc 22(9):3538–3548
Article Google Scholar
Wang W, Wei C, Yang W, Liu J (2018) GLADNEt: Low-light enhancement network with global awareness. In: IEEE Int conf automat face & gesture recognit (FG 2018)
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Proc 13(4):600–612
Article Google Scholar
Xu S, Jiang S, Min W (2017) No-reference/blind image quality assessment: a survey. IETE Techn Rev 34(3):223–245
Article Google Scholar
Zañartu M, Mehta DD, Ho JC, Wodicka GR, Hillman RE (2011) Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: a case study. J Acoust Soc Am 129(1):326–339
Article Google Scholar
Zhao H, Gallo O, Frosio I, Kautz J (2017) Loss functions for image restoration with neural networks. IEEE Trans Comput Imaging 3(1):47–57
Article Google Scholar
Ziethe A, Patel R, Kunduk M, Eysholdt U, Graf S (2011) Clinical analysis methods of voice disorders. Curr Bioinform 6(3):270–285
Article CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank Maximilian Seitzer for proofreading this work.

Author information

Authors and Affiliations

Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
Pablo Gómez, Marion Semmler, Anne Schützenberger & Michael Döllinger
ENT Department, University Hospital Regensburg, University Regensburg, Franz-Josef-Strauß-Allee 11, 93053, Regensburg, Germany
Christopher Bohr

Authors

Pablo Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Marion Semmler
View author publications
You can also search for this author in PubMed Google Scholar
Anne Schützenberger
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Bohr
View author publications
You can also search for this author in PubMed Google Scholar
Michael Döllinger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Gómez.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 323308998 under grant noS. DO1247/8-1 and BO4399/2-1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gómez, P., Semmler, M., Schützenberger, A. et al. Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network. Med Biol Eng Comput 57, 1451–1463 (2019). https://doi.org/10.1007/s11517-019-01965-4

Download citation

Received: 30 October 2018
Accepted: 20 February 2019
Published: 21 March 2019
Issue Date: 19 July 2019
DOI: https://doi.org/10.1007/s11517-019-01965-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

An automatic framework for endoscopic image restoration and enhancement

CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy

Towards a Fast and Safe LED-Based Photoacoustic Imaging Using Deep Convolutional Neural Network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

An automatic framework for endoscopic image restoration and enhancement

CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy

Towards a Fast and Safe LED-Based Photoacoustic Imaging Using Deep Convolutional Neural Network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation