Technical note
Videokymographic image processing: Objective parameters and user-friendly interface

https://doi.org/10.1016/j.bspc.2011.02.007Get rights and content

Abstract

Videolaryngostroboscopy (VLS) is undoubtedly a first choice examination technique in the diagnosis of several laryngeal pathologies. However, in case of low intensity or strong a-periodicity of the vocal sound, the VLS mechanism becomes ineffective in describing subsequent phases of the vocal cycle. To overcome such limitations, a new technique, called videokymography (VKG), was developed. VKG delivers images and displays the vibratory pattern from a single line selected from the whole VLS image, at the speed of approximately 8000 line-images/s. Despite its usefulness, parameter evaluation has been mostly based on visual inspection and no quantitative analysis of videokymographic images is commercially available at this time.

This article presents the VKG-Analyser, a new tool for measuring and tracking quantitative parameters from VKG images. Specifically, the left-to-right period, amplitude and phase ratios and phase symmetry index were evaluated. The case of incomplete glottis closure, the minimum distance between folds was implemented.

A digital image processing algorithm was developed and optimised for the analysis of VKG recordings that require intensity adjustment, noise removal and robust techniques for edge detection to avoid fluctuations of the grey levels in regions far from the vocal folds. The VKG-Analyser relies on a user-friendly interface that allows for the storage and retrieval of patients’ data and optimises the image analysis, according to a set of parameters that can be manually adjusted by the user.

It was successfully tested on a set of synthetic images and applied to real VKG images, both in the case of complete and incomplete glottis closure.

The new software tool aims to provide fast, reliable and reproducible measures. When applied to a large set of data, it can define reference values for normal and pathological cases, providing a valid support for diagnosis and evaluation of surgical effectiveness.

Introduction

Along with the refinement of diagnostic and therapeutic techniques in phoniatrics and phonosurgery, the need for objective methods for vocal fold cycle evaluation, and its pathological or post-treatment changes, has gained increasing relevance.

Videolaryngoscopy (VLS), which is currently considered a first-choice test for the diagnosis of most laryngeal pathologies, bears some intrinsic limitations that restrict its clinical application. In fact, the stroboscopic image of the vocal fold vibration is basically an optical illusion made by the human eye, arising from the virtual reconstruction of adjacent phases of different vocal cycles, as given by the stroboscopic flashes in subsequent time instants. Hence, in the case of strong intensity deficiency or a-periodicity of the vocal fold vibrations, the stroboscopic technique, at the phonatory frequency, is ineffective in representing subsequent phases of the vocal-fold vibration cycle [20], [4], [27]. To overcome these limitations, research has developed in two main directions: digital high-speed videoendoscopy, and videokymography (VKG). Digital high-speed videoendoscopy (HSV) systems contain a large amount of physiological and dynamic information in a single examination [43], [5], [3], [6]. Although the technology for HSV capture is improving, the clinical application of these systems is limited, because the device is very expensive. Digital kymography (DKG) is defined as kymography extracted from HSV. With DKG the measurement position can be selected from any part of the vocal folds after the recording is performed [10], [12], [17], [42], [18], [24], [9]. With the latest DKG systems, the line resolution and the image rate are about the same as with the VKG method. Finally, some authors have combined VLS and VKG in a new technique, called videostrobokymography (VSK) [13], [34]. In VSK, the individual line-images are taken from the digitised successive stroboscopic video images. The advantage of the method is that it does not require a special high-speed video camera. The disadvantage is that it suffers from stroboscopic limitations and does not allow reliable viewing of irregular vibrations.

Hence, these approaches will not be analysed here, our focussing will be on videokymography (VKG), and its high-resolution and low-cost characteristics. Photokymography, which could be considered as a first approach to VKG, was introduced in 1984 [8]. It developed into VKG and has been improved in subsequent years, with many applications [11], [31], [32], [35], [36], [37], [38], [14], [40]. Though not curently common in daily clinical use, the new generation of VKG devices has appealing characteristics, and its extensive application in laryngological diagnosis is foreseen.

VKG can overcome the VLS limitations because it is capable of delivering images from a single line selected from the whole image, at a speed of approximately 8000 line-images/s, independent of the vocal sound characteristics. The selected line is marked in white on the screen, and the user can position it by moving the endoscope. The VKG recording is divided into video frames, i.e., segments of approximately 15/18 ms in duration. Images are not in colour, and continuous high-intensity light is needed [31], [35], [36], [37], [38], [14], [40], [16].

In the first generation of VKG devices, such as the KayElemetrics VKG Camera, Model 8900 considered also in our first study [16], [22], the line selection is fixed to the first line of the VLS image. Before activating the VKG mode using a footswitch, the user had to position the desired line at the upper edge of the VLS image. With this device, the two working modes (VLS and VKG) are mutually exclusive, preventing the operator from seeing the scan position while using the kymographic mode. This restriction has been removed with the new device considered here, as described in Section 2.

The selected portion of the vocal fold is registered for all medial–lateral movements during several vocal cycles. Movements of the selected glottal line are displayed on the monitor (time is on the vertical axis). A picture is shown in Fig. 1 (upper plots).

Images are digitally stored for further examination in slow-motion or for printing, which allows for visual comparison between different patients or different laryngoscopical pictures from the same subject corresponding to different stages of her/his clinical history (e.g., pre–post surgical/pharmacological treatment).

As a drawback, the positioning and orientation of the device with respect to the glottis are critical in VKG, making the fulfilment of the examination more involved than with VLS, because greater co-operation is needed from the patient [28]. Recently, a new VKG system was developed, whose features can overcome some of these drawbacks [29]. Although developed under the old VKG device [16], the VKG image analysis tool presented here, called the VKG-Analyser, was adjusted and even simplified to deal with video recordings obtained with the new system with improved features. The VKG-Analyser could work also on DKG images, because it is independent of the image acquisition device.

The article is organised as follows: the VKG system is described in Section 2. Objective VKG parameters and their evaluation with VKG-Analyser are described in Section 3, which is also devoted to the user interface characteristics. In Section 4, the experimental results are described. Concluding remarks are offered in Section 5.

Section snippets

VKG system

The first VKG system considered for developing the software tool was the KayElemetrics VKG Camera, Model 8900 [16] based on a special camera that can operate in standard and high-speed modes [22]. In this article a new generation videokymographic system that not only provides kymographic images, but also simultaneously presents laryngoscopic images for navigating the endoscope to the desired position [29] was used. The new device is still based on a special camera that can operate in standard

VKG measurements

As outlined in Section 1, the analysis of VKG images is currently based on visual inspection only. It relies on the subjective perception of vocal fold symmetry, the presence or absence of glottis deficiency in any cycle phase, the amplitude of vibration of one vocal fold with respect to the other, and so on. Such evaluation makes it difficult to perform comparisons among different subjects, different pathological conditions and multicentric databases. Some objective measures have been

Experimental results

To test the VKG-Analyser, a set of synthetic images was built up that closely resembled a subset of those reported in [39], which represents an almost complete and perhaps the only available detailed reference set. Of course such images are non-realistic as far as the available grey levels are concerned, which are limited to three: black (0), grey (1 2 8) and white (2 5 5). Changing the threshold allows for inclusion of the grey part of the image in the ROI, which can thus change the area

Conclusions

VKG is a new device that delivers high-speed images from a single line selected from the whole VLS image, allowing for a detailed qualitative analysis of vocal fold vibration. Recently, first results have been proposed in the literature for quantitative analysis of VKG images, in the case of vocal fold closure.

In this article, the VKG Analyser, a new user-friendly tool, is proposed for the extraction and tracking of objective parameters from VKG images. The VKG-Analyser performs the evaluation

Acknowledgments

The authors greatly acknowledge Dr. Švec and Dr. Vetesnik, Olomouc Univ., Faculty of Science, Dept. of Experimental Physics, Biophysics Lab., for the support and useful suggestions in the development of the proposed VKG image analysis tool.

This work has been developed with the contribution of COST Action 2103 “Advances in voice quality assessment”, and Ente Cassa di Risparmio di Firenze, “Interdisciplinary Laboratory of Biomedical Acoustics–LIAB”, project n. 2007/0754.

References (43)

  • D. Deliyski et al.

    Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution

    Folia Phoniatr. Logop.

    (2008)
  • M. Döllinger et al.

    Vibration parameter extraction from endoscopic image series of the vocal folds

    IEEE Trans. Biomed. Eng.

    (2002)
  • V. Gall

    Strip kymography of the glottis

    Eur. Arch. Otorhinolaryngol.

    (1984)
  • S. Hertegard et al.

    High-speed imaging: applications and development

    Logoped. Phoniatr. Vocol.

    (1998)
  • M. Hess et al.

    High-speed, light-intensified digital imaging of vocal fold vibrations in high optical resolution via indirect microlaryngoscopy

    Ann. Otol. Rhinol. Laryngol.

    (1993)
  • H Hirose

    High-speed digital imaging of vocal fold vibration

    Acta Otolaryngol (Stockh)

    (1998)
  • Y. Isogai

    Laryngostrobography by the newly developed strobo-motion-analyzer

    Larynx Jpn.

    (1994)
  • J. Jiang et al.

    Quantitative study of mucosal wave via videokymography in canine larynges

    Laryngoscope

    (2000)
  • M. Kass et al.

    Snakes: active contour models

    Int. J. Comput. Vis.

    (1988)
  • KayElemetrics, High-Speed Video (HSV) Model 9700: Instruction Manual

    (2002)
  • Cited by (20)

    • Videokymogram Analyzer Tool: Human–computer comparison

      2022, Biomedical Signal Processing and Control
    • Intraglottal Aerodynamics at Vocal Fold Vibration Onset

      2021, Journal of Voice
      Citation Excerpt :

      High speed films and videokymographic recordings provide a global view of the phenomenon, but photoglottography gives the most accurate measure of glottal area. The photoglottographic signals are more accurate than those provided by image processing from high speed video13 or from videokymography.14–16 For measurement of the glottal area and calibration of the photoglottographic signal a rigid 90° Wolf laryngeal telescope and an ATMOS Strobo 21 LED stroboscope (Atmos Medizin Technik, Lenzkirch, Germany) were used to obtain still images of the entire glottis at the time of maximal opening, as early as possible after onset.

    • Lung volume affects the decay of oscillations at the end of a vocal emission

      2020, Biomedical Signal Processing and Control
      Citation Excerpt :

      Since the precise position of the photodiode cannot be reproduced from record to record, in each record, the amplitude of the light signal was normalized and expressed - in the damping phase - as a fraction of the amplitude of the first ‘free oscillation’ after the last closed plateau. High speed video provides an adequate global view of the moving vocal folds and the changing glottal shape, but image processing from high-speed video [15] or even videokymography [16,17] is limited by the number of pixels (resolution), and merely by the frequency of the measurement moments, as has been demonstrated by the recent (2019) experiments of Horacek et al. [18]. This of course also limits the sensitivity for detection of very small oscillations at the end of the damping phase, which are crucial in our experiments.

    • The dynamics of vocal onset

      2019, Biomedical Signal Processing and Control
      Citation Excerpt :

      High speed films and videokymographic recordings provide a global view of the phenomenon, but photoglottography gives the most accurate measure of glottal area. The photoglottographic signals are more accurate than those provided by image processing from high speed video [2] or from videokymography [10–12]. The glottal flow rate waveform (flowglottogram) [14] was recorded using a Rothenberg mask and the MSIF2 inverse filtering system of Glottal Enterprises (Syracuse, NY).

    • Kinematic model for simulating mucosal wave phenomena on vocal folds

      2019, Biomedical Signal Processing and Control
    View all citing articles on Scopus
    View full text