Videokymographic image processing: Objective parameters and user-friendly interface

doi:10.1016/j.bspc.2011.02.007

Biomedical Signal Processing and Control

Volume 7, Issue 2, March 2012, Pages 192-201

https://doi.org/10.1016/j.bspc.2011.02.007 Get rights and content

Abstract

Videolaryngostroboscopy (VLS) is undoubtedly a first choice examination technique in the diagnosis of several laryngeal pathologies. However, in case of low intensity or strong a-periodicity of the vocal sound, the VLS mechanism becomes ineffective in describing subsequent phases of the vocal cycle. To overcome such limitations, a new technique, called videokymography (VKG), was developed. VKG delivers images and displays the vibratory pattern from a single line selected from the whole VLS image, at the speed of approximately 8000 line-images/s. Despite its usefulness, parameter evaluation has been mostly based on visual inspection and no quantitative analysis of videokymographic images is commercially available at this time.

This article presents the VKG-Analyser, a new tool for measuring and tracking quantitative parameters from VKG images. Specifically, the left-to-right period, amplitude and phase ratios and phase symmetry index were evaluated. The case of incomplete glottis closure, the minimum distance between folds was implemented.

A digital image processing algorithm was developed and optimised for the analysis of VKG recordings that require intensity adjustment, noise removal and robust techniques for edge detection to avoid fluctuations of the grey levels in regions far from the vocal folds. The VKG-Analyser relies on a user-friendly interface that allows for the storage and retrieval of patients’ data and optimises the image analysis, according to a set of parameters that can be manually adjusted by the user.

It was successfully tested on a set of synthetic images and applied to real VKG images, both in the case of complete and incomplete glottis closure.

The new software tool aims to provide fast, reliable and reproducible measures. When applied to a large set of data, it can define reference values for normal and pathological cases, providing a valid support for diagnosis and evaluation of surgical effectiveness.

Introduction

Along with the refinement of diagnostic and therapeutic techniques in phoniatrics and phonosurgery, the need for objective methods for vocal fold cycle evaluation, and its pathological or post-treatment changes, has gained increasing relevance.

Videolaryngoscopy (VLS), which is currently considered a first-choice test for the diagnosis of most laryngeal pathologies, bears some intrinsic limitations that restrict its clinical application. In fact, the stroboscopic image of the vocal fold vibration is basically an optical illusion made by the human eye, arising from the virtual reconstruction of adjacent phases of different vocal cycles, as given by the stroboscopic flashes in subsequent time instants. Hence, in the case of strong intensity deficiency or a-periodicity of the vocal fold vibrations, the stroboscopic technique, at the phonatory frequency, is ineffective in representing subsequent phases of the vocal-fold vibration cycle [20], [4], [27]. To overcome these limitations, research has developed in two main directions: digital high-speed videoendoscopy, and videokymography (VKG). Digital high-speed videoendoscopy (HSV) systems contain a large amount of physiological and dynamic information in a single examination [43], [5], [3], [6]. Although the technology for HSV capture is improving, the clinical application of these systems is limited, because the device is very expensive. Digital kymography (DKG) is defined as kymography extracted from HSV. With DKG the measurement position can be selected from any part of the vocal folds after the recording is performed [10], [12], [17], [42], [18], [24], [9]. With the latest DKG systems, the line resolution and the image rate are about the same as with the VKG method. Finally, some authors have combined VLS and VKG in a new technique, called videostrobokymography (VSK) [13], [34]. In VSK, the individual line-images are taken from the digitised successive stroboscopic video images. The advantage of the method is that it does not require a special high-speed video camera. The disadvantage is that it suffers from stroboscopic limitations and does not allow reliable viewing of irregular vibrations.

Hence, these approaches will not be analysed here, our focussing will be on videokymography (VKG), and its high-resolution and low-cost characteristics. Photokymography, which could be considered as a first approach to VKG, was introduced in 1984 [8]. It developed into VKG and has been improved in subsequent years, with many applications [11], [31], [32], [35], [36], [37], [38], [14], [40]. Though not curently common in daily clinical use, the new generation of VKG devices has appealing characteristics, and its extensive application in laryngological diagnosis is foreseen.

VKG can overcome the VLS limitations because it is capable of delivering images from a single line selected from the whole image, at a speed of approximately 8000 line-images/s, independent of the vocal sound characteristics. The selected line is marked in white on the screen, and the user can position it by moving the endoscope. The VKG recording is divided into video frames, i.e., segments of approximately 15/18 ms in duration. Images are not in colour, and continuous high-intensity light is needed [31], [35], [36], [37], [38], [14], [40], [16].

In the first generation of VKG devices, such as the KayElemetrics VKG Camera, Model 8900 considered also in our first study [16], [22], the line selection is fixed to the first line of the VLS image. Before activating the VKG mode using a footswitch, the user had to position the desired line at the upper edge of the VLS image. With this device, the two working modes (VLS and VKG) are mutually exclusive, preventing the operator from seeing the scan position while using the kymographic mode. This restriction has been removed with the new device considered here, as described in Section 2.

The selected portion of the vocal fold is registered for all medial–lateral movements during several vocal cycles. Movements of the selected glottal line are displayed on the monitor (time is on the vertical axis). A picture is shown in Fig. 1 (upper plots).

Images are digitally stored for further examination in slow-motion or for printing, which allows for visual comparison between different patients or different laryngoscopical pictures from the same subject corresponding to different stages of her/his clinical history (e.g., pre–post surgical/pharmacological treatment).

As a drawback, the positioning and orientation of the device with respect to the glottis are critical in VKG, making the fulfilment of the examination more involved than with VLS, because greater co-operation is needed from the patient [28]. Recently, a new VKG system was developed, whose features can overcome some of these drawbacks [29]. Although developed under the old VKG device [16], the VKG image analysis tool presented here, called the VKG-Analyser, was adjusted and even simplified to deal with video recordings obtained with the new system with improved features. The VKG-Analyser could work also on DKG images, because it is independent of the image acquisition device.

The article is organised as follows: the VKG system is described in Section 2. Objective VKG parameters and their evaluation with VKG-Analyser are described in Section 3, which is also devoted to the user interface characteristics. In Section 4, the experimental results are described. Concluding remarks are offered in Section 5.

Section snippets

VKG system

The first VKG system considered for developing the software tool was the KayElemetrics VKG Camera, Model 8900 [16] based on a special camera that can operate in standard and high-speed modes [22]. In this article a new generation videokymographic system that not only provides kymographic images, but also simultaneously presents laryngoscopic images for navigating the endoscope to the desired position [29] was used. The new device is still based on a special camera that can operate in standard

VKG measurements

As outlined in Section 1, the analysis of VKG images is currently based on visual inspection only. It relies on the subjective perception of vocal fold symmetry, the presence or absence of glottis deficiency in any cycle phase, the amplitude of vibration of one vocal fold with respect to the other, and so on. Such evaluation makes it difficult to perform comparisons among different subjects, different pathological conditions and multicentric databases. Some objective measures have been

Experimental results

To test the VKG-Analyser, a set of synthetic images was built up that closely resembled a subset of those reported in [39], which represents an almost complete and perhaps the only available detailed reference set. Of course such images are non-realistic as far as the available grey levels are concerned, which are limited to three: black (0), grey (1 2 8) and white (2 5 5). Changing the threshold allows for inclusion of the grey part of the image in the ROI, which can thus change the area

Conclusions

VKG is a new device that delivers high-speed images from a single line selected from the whole VLS image, allowing for a detailed qualitative analysis of vocal fold vibration. Recently, first results have been proposed in the literature for quantitative analysis of VKG images, in the case of vocal fold closure.

In this article, the VKG Analyser, a new user-friendly tool, is proposed for the extraction and tracking of objective parameters from VKG images. The VKG-Analyser performs the evaluation

Acknowledgments

The authors greatly acknowledge Dr. Švec and Dr. Vetesnik, Olomouc Univ., Faculty of Science, Dept. of Experimental Physics, Biophysics Lab., for the support and useful suggestions in the development of the proposed VKG image analysis tool.

This work has been developed with the contribution of COST Action 2103 “Advances in voice quality assessment”, and Ente Cassa di Risparmio di Firenze, “Interdisciplinary Laboratory of Biomedical Acoustics–LIAB”, project n. 2007/0754.

References (43)

H.S. Bonilha et al.
Period and glottal width irregularities in vocally normal speakers
J. Voice
(2008)
C. Manfredi et al.
Objective vocal fold vibration assessment from videokymographic images
Biomed. Signal Process. Control
(2006)
T. McInerney et al.
Deformable models in medical image analysis: a survey
Med. Image Anal.
(1996)
J. Švec et al.
Videokymography: high-speed line scanning of vocal fold vibration
J. Voice
(1996)
Verdonck-de-Leeuw et al.
Deviant vocal fold vibration as observed during videokymography: the effect on voice quality
J. Voice
(2001)
T. Wittenberg et al.
Functional imaging of vocal fold vibration: digital multislice high-speed kymography
J. Voice
(2000)
S. Bianchi et al.
Objective vocal fold vibration assessment from videokymographic images
J.M. Bland et al.
Statistical methods for assessing agreement between two methods of clinical measurement
Lancet
(1986)
D. Colden et al.
Stroboscopic assessment of vocal-fold atypia and early cancer
Ann. Otol. Rhinol. Laryngol.
(2001)
D. Deliyski
Endoscope motion compensation for laryngeal high-speed endoscopy
J. Voice
(2005)

D. Deliyski et al.

Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution

Folia Phoniatr. Logop.

(2008)

M. Döllinger et al.

Vibration parameter extraction from endoscopic image series of the vocal folds

IEEE Trans. Biomed. Eng.

(2002)

V. Gall

Strip kymography of the glottis

Eur. Arch. Otorhinolaryngol.

(1984)

S. Hertegard et al.

High-speed imaging: applications and development

Logoped. Phoniatr. Vocol.

(1998)

M. Hess et al.

High-speed, light-intensified digital imaging of vocal fold vibrations in high optical resolution via indirect microlaryngoscopy

Ann. Otol. Rhinol. Laryngol.

(1993)

H Hirose

High-speed digital imaging of vocal fold vibration

Acta Otolaryngol (Stockh)

(1998)

Y. Isogai

Laryngostrobography by the newly developed strobo-motion-analyzer

Larynx Jpn.

(1994)

J. Jiang et al.

Quantitative study of mucosal wave via videokymography in canine larynges

Laryngoscope

(2000)

M. Kass et al.

Snakes: active contour models

Int. J. Comput. Vis.

(1988)

KayElemetrics, High-Speed Video (HSV) Model 9700: Instruction Manual

(2002)

Cited by (20)

Videokymogram Analyzer Tool: Human–computer comparison
2022, Biomedical Signal Processing and Control
Videokymography (VKG) is a modern video recording technique used in laryngology and phoniatrics to examine vocal fold vibrations. To obtain quantitative information on the vocal fold vibration, VKG image analysis is needed but no software has yet been validated for this purpose. Here, we introduce a validated software tool that aids clinicians to evaluate diagnostically important vibration characteristics in VKG and other types of kymographic recordings. State-of-the-art methods for automated image evaluation were implemented and tested on a set of videokymograms with a wide range of vibratory characteristics, including healthy and pathologic voices. The automated image segmentation results were compared to manual segmentation results of six evaluators revealing average differences smaller than one pixel. Furthermore, the automatically categorized vibratory parameters precisely agreed with the average visual assessment in 84 and 91 percent of the cases for pathological and healthy patients, respectively. Based on these results, the newly developed software was found to be a valid, reliable automated tool for the quantification of vocal fold vibrations from VKG images, offering a number of novel features relevant for clinical practice.
Vocal Fold Collision Speed in vivo: The Effect of Loudness
2022, Journal of Voice
Mechanical impact stress on the vocal fold surface, particularly when excessive, has been postulated to cause the so-called phonotraumatic tissue lesions, such as nodules and polyps. The collision stress between the vocal folds depends on the vocal fold velocity at the time of impact. Hence this vocal fold collision speed is a relevant parameter when considering biomechanical economy of phonation, especially in voice professionals needing a louder voice than normal. Combining a precise photometric measurement of glottal area and simultaneous measurements of translaryngeal impedance (electroglottogram) for identifying the time of the maximum rate of increase of vocal fold contact allows computing the vocal fold collision speed in a wide range of loudnesses. The vocal fold collision speed is - for modal voicing - always smaller than the maximum vocal fold velocity during the closing phase, but it strongly increases with intensity. Moreover, this increase shows a biphasic pattern, with a significant enhancement from a certain value of dB on. Understanding physiological variables that influence vocal fold collision forces provides relevant insight into the pathophysiology and the prevention of voice disorders associated with phonotraumatic vocal hyperfunction.
Intraglottal Aerodynamics at Vocal Fold Vibration Onset
2021, Journal of Voice
Citation Excerpt :
High speed films and videokymographic recordings provide a global view of the phenomenon, but photoglottography gives the most accurate measure of glottal area. The photoglottographic signals are more accurate than those provided by image processing from high speed video13 or from videokymography.14–16 For measurement of the glottal area and calibration of the photoglottographic signal a rigid 90° Wolf laryngeal telescope and an ATMOS Strobo 21 LED stroboscope (Atmos Medizin Technik, Lenzkirch, Germany) were used to obtain still images of the entire glottis at the time of maximal opening, as early as possible after onset.
The most frequently observed type of voice onset in spontaneous speech in normal subjects is the soft onset, and it may be considered as the “physiological” onset. It starts from an immobile narrow glottal slit crossed by a continuous airflow, and then a few oscillations (even a single one in some cases) precede the first glottal closure. It is a transient event, during which the acting forces, lung pressure, intraglottal pressure, myoelastic tension of the vocal fold (VF) oscillator and inertance of the supraglottal vocal tract, interact to progressively reach the steady state of a sustained oscillation. Combined measurements of flow, area, and pressure provide a detailed qualitative and quantitative analysis of the intraglottal mechanical events at the precise moment of starting oscillation in a physiological (soft or soft/breathy) onset. Our in vivo measurements of airflow and glottal area show that the very first oscillation occurs exactly at the time when turbulence appears at the level of the glottal narrowing, ie, when the Reynolds number reaches its critical value. The turbulence may be assumed to trigger an oscillator consisting in the ensemble of the VFs and the air of the vocal tract, which is known to be weakly damped. Turbulence can act here as an aspecific flick, triggering the oscillator, the frequency of oscillation being determined by its mechanical properties. Furthermore, the first noticeable glottal oscillations are sinusoidal: the VFs are neither steeply sucked together by a negative Bernoulli pressure, nor burst apart by the lung pressure. Our measurements show that, at the critical time, the rising positive lung pressure is balanced by the rising negative Bernoulli pressure generated by the transglottal flow.
Lung volume affects the decay of oscillations at the end of a vocal emission
2020, Biomedical Signal Processing and Control
Citation Excerpt :
Since the precise position of the photodiode cannot be reproduced from record to record, in each record, the amplitude of the light signal was normalized and expressed - in the damping phase - as a fraction of the amplitude of the first ‘free oscillation’ after the last closed plateau. High speed video provides an adequate global view of the moving vocal folds and the changing glottal shape, but image processing from high-speed video [15] or even videokymography [16,17] is limited by the number of pixels (resolution), and merely by the frequency of the measurement moments, as has been demonstrated by the recent (2019) experiments of Horacek et al. [18]. This of course also limits the sensitivity for detection of very small oscillations at the end of the damping phase, which are crucial in our experiments.
At the end of a vocal emission, when the voicing is not interrupted by a laryngeal closure, a damped oscillatory motion of each vocal fold can be observed after the last contact phase of the two fold edges on the midline. It can be precisely analysed using a measure of transglottal light intensity (photoglottography). Actually, during modal phonation, the vocal oscillator mainly comprises two components: the vocal folds themselves and the vibrating air mass. A simple calculation suggests that the internal air mass set into vibration is larger than the vocal fold mass. In order to investigate the effect of the vibrating air mass, a voicing protocol was elaborated for validly measuring and comparing damping characteristics in two conditions: at high and at low lung volume, ceteris paribus. Glottal area, intraoral pressure, electroglottogram and sound were recorded simultaneously. Elaborated voicing protocol consisted in series of fast repetitions (3–4 s⁻¹) of the vowel /ε/, each vocalization being followed by an abrupt bilabial occlusion with complete airflow interruption. The average difference in lung volume between the two conditions is approximately 2410 mL. The results show that the decay of vocal fold oscillation is influenced by the amount of lung air that is set into oscillation. A reduction of the air volume leads to a significant increase in the rate of decay, thus voicing at low lung volume requires more energy, which is of importance for voice hygiene.
The dynamics of vocal onset
2019, Biomedical Signal Processing and Control
Citation Excerpt :
High speed films and videokymographic recordings provide a global view of the phenomenon, but photoglottography gives the most accurate measure of glottal area. The photoglottographic signals are more accurate than those provided by image processing from high speed video [2] or from videokymography [10–12]. The glottal flow rate waveform (flowglottogram) [14] was recorded using a Rothenberg mask and the MSIF2 inverse filtering system of Glottal Enterprises (Syracuse, NY).
Vocal onset is the process occurring between the first detectable oscillatory glottal movement and the steady state vibration of the vocal folds. To some extent, the voice onset mirrors the voice offset. High speed imaging, photo-, electro-, flow- and ultrasonoglottography and sound analysis have been used in combinations to allow detailed qualitative insight into the phenomenon. Moreover, the instantaneous intraglottal pressure can be computed from the combined records of transglottal airflow and glottal area. A large number of vocal onsets of different types were analysed in various conditions of modal healthy phonation. Vocal folds (VF) vibration can start either from a closed glottis (hard onset) or from an open glottis (soft/breathy onset). In a soft onset, the amplitude of oscillations progressively increases over 2 to more than 30 cycles, before the first clear closed plateau is achieved. It is not possible to define whether the first movement of VF is towards medial or lateral. Hard, soft and breathy onsets can be clearly identified. Flow- and photoglottography are the most sensitive signals in detecting the first glottal movements in soft and breathy onsets. The shape of the EGG signal depends on the contact of the VF edges. The duration of the onset phase is to some extent related to VF adduction speed and peak expiratory flow. The ultrasound technique is sensitive, but lacks physiological interpretation. From the first onset-cycles on, the intraglottal pressure during the opening phase of the glottis exceeds that during the closing phase. During soft/breathy onsets with a sufficiently large number of cycles, when the vibrating mass increases, a trend appears toward a slight progressive decrease of the fundamental frequency of the oscillations, likely related to the increasing vibrating mass.
Kinematic model for simulating mucosal wave phenomena on vocal folds
2019, Biomedical Signal Processing and Control
Mucosal waves have been found to be important for evaluating vocal fold vibrations in laryngological practice. While they are routinely evaluated visually, the knowledge on the physical phenomena related to mucosal wave propagation is limited. Kymographic imaging, in particular, reveals various mucosal wave features that deserve more understanding in order to advance functional diagnostics of voice disorders. Here, a kinematic model is presented which simulates mucosal waves on human vocal folds. The vocal fold geometry is based on a parametrically adjustable M5 model. A kinematic rule is used for simulating the propagation of the mucosal wave from the bottom of the vocal folds upwards and laterally over the upper vocal fold surface. The model maps the changes of the coronal shape of the vocal folds through vibration cycles. The vibration characteristics including the mucosal wave movements are then visualized using a synthetic kymogram graphically obtained through a local illumination method. The model can serve as an educational and research tool for studying the mucosal wave features and their appearance in laryngeal kymographic images.

View all citing articles on Scopus

View full text

Technical noteVideokymographic image processing: Objective parameters and user-friendly interface

Abstract

Introduction

Section snippets

VKG system

VKG measurements

Experimental results

Conclusions

Acknowledgments

J. Voice

Biomed. Signal Process. Control

Med. Image Anal.

J. Voice

J. Voice

J. Voice

Objective vocal fold vibration assessment from videokymographic images

Statistical methods for assessing agreement between two methods of clinical measurement

Lancet

Stroboscopic assessment of vocal-fold atypia and early cancer

Ann. Otol. Rhinol. Laryngol.

Endoscope motion compensation for laryngeal high-speed endoscopy

J. Voice

Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution

Folia Phoniatr. Logop.

Vibration parameter extraction from endoscopic image series of the vocal folds

IEEE Trans. Biomed. Eng.

Strip kymography of the glottis

Eur. Arch. Otorhinolaryngol.

High-speed imaging: applications and development

Logoped. Phoniatr. Vocol.

High-speed, light-intensified digital imaging of vocal fold vibrations in high optical resolution via indirect microlaryngoscopy

Ann. Otol. Rhinol. Laryngol.

High-speed digital imaging of vocal fold vibration

Acta Otolaryngol (Stockh)

Laryngostrobography by the newly developed strobo-motion-analyzer

Larynx Jpn.

Quantitative study of mucosal wave via videokymography in canine larynges

Laryngoscope

Snakes: active contour models

Int. J. Comput. Vis.

KayElemetrics, High-Speed Video (HSV) Model 9700: Instruction Manual

Technical note
Videokymographic image processing: Objective parameters and user-friendly interface