Collective spectral pattern complexity analysis of voicing in normal males and larynx cancer patients following radiotherapy

https://doi.org/10.1016/j.bspc.2006.07.001Get rights and content

Abstract

Normal male voicing is defined, and voicing recovery after radiotherapy for larynx cancer quantified, using spectral domain complexity analysis of electro-glottogram conductance variations measured across the larynx during vowel phonation. These variations directly correlate with vocal fold vibrations that drive voice production. Approximate entropy is shown to concisely quantify the collective spectral pattern of the sustained impedance waveform after normalisation with respect to the varying fundamental frequency and power. It reveals a double banded reference standard in normal males. Forty-eight male larynx cancer patients were studied in parallel with an unrestricted perceptual analysis before and 1 year after radiotherapy. Two-thirds of patients had spectral approximate entropy values close to normal approximate entropy reference standards after 1 year. A quarter of patients showed reduced approximate entropy, predominantly in the most aberrant perceptual categories. Collective spectral pattern complexity analysis of vowel phonation has the potential to be a reliable, single parameter measure of voicing quality in these cancer patients.

Introduction

United Kingdom cancer statistics for 2001 show that the larynx is the site for nearly a third of all 7820 new head and neck cancers and that well over 4 times as many men than women suffered from the disease [1]. Hence, it is as prevalent as cervix cancer in women though it attracts far less public attention. The 5 year survival of larynx cancer patients following treatment is good, at approximately two-thirds. Hence, subsequent quality of life, particularly swallowing and voice preservation, is important for a large number of individuals seeking to resume normal life. Radiotherapy arguably has fewer side effects than surgery, which is self evidently more invasive. Whereas surgery involves the excision or ablation of tissues harboring diseased cells, radiotherapy delivers a tumourcidal ionizing radiation dose using penetrating X-rays delivered in a manner that leaves healthy tissues intact and able to recover. Writing in the New England Journal of Medicine in 2003, Forastiere et al. noted that by using chemotherapy and radiotherapy appropriately ‘in most patients with laryngeal cancer, the disease can be managed without a primary surgical approach’ [2]. This reflects a shift in clinical practice from surgery to radiotherapy that began in the 1990s, with larynx preservation a key factor.

Larynx cancer irradiation may leave the targeted tissues intact but it does perturb vocal fold functionality for months after treatment [3], which in turn directly influences voice quality [4], [5], [6], [7]. An essential part of verbal communication is vowel generation, which is underpinned by fold vibration. The ability to produce vowel sounds and voicing in general is traditionally informed by professional opinion, specifically by the speech and language therapist (SALT). However, SALT terminology and the proliferation of assessment parameters has been cause for concern in head and neck radiotherapy [8]. With so many features selectively assessed, it has not been possible to define either absolutely or concisely what constitutes a ‘normal’ voice. Similarly, it has not been possible to explain how cancer patients subjected to intense vocal fold irradiation during radiotherapy can recover voicing to a level that could be considered to be ‘normal’. Hence, tracking recovery is problematic and made worse by inter-observer errors. Given the nature of human perception of acoustic phenomena, it is not surprising that SALT references to the importance of spectral features usually found in signal processing are commonplace [9]. The variability of frequency and amplitude of pressure and vocal fold contact are labelled ‘jitter’ and ‘shimmer’, respectively. Similarly, ‘breathiness’ and ‘whisper’, arising from arguably undesirable air flows during phonation, are factors that would be recognized by physical scientists as reducing signal to noise ratio in voicing. The spectral envelope [9] is used to understand the decay of harmonics in patients. This solid scientific basis, supported by the availability of physical measurements, offers a route to rationalizing the objective description of normal and aberrant voicing.

The electro-glottogram (EGG) is one instrument available to the SALT. An example is shown in Fig. 1(a). It measures the impedance changes across the larynx during voicing, producing time series with a characteristic waveform that reflect the vibratory action of the vocal folds, as seen in Fig. 1(b) [10]. However, the important and variable detail present in an otherwise elegantly simple EGG waveform is hard to interpret, even for the most experienced of SALTs [7], [11], [12]. The detail in the corresponding EGG spectral pattern is also difficult to interpret, even though the frequency domain has instantly recognizable, gross features in the form of the fundamental and trailing harmonic peaks (Fig. 1(c)). In 2004 Moore et al. suggested the first machine computed, single parameter reference standards for normal voicing based on the EGG measurements and compared these to a small sample of patients known to have significantly perturbed voices [13]. The key steps were to apply complexity analysis in the spectral domain, rather than the time domain, and to avoid the selection of specific peaks or features by analysing the spectral pattern collectively.

This paper shows that the EGG can be deployed to differentiate the healthy normal male population, quantify pathological voicing in pre-treatment male larynx cancer patients and track the pattern of patient recovery following radiotherapy. The approach reported is based on the regularity statistic ‘approximate entropy’ (ApEn) [14] applied across a suitably normalised frequency spectrum for vowel phonation, which has been decoupled from the effects of drift in phonation fundamental frequency (Fig. 1(c)).

Section snippets

Theory

Vowel phonation is predominantly driven by vocal fold vibration, which is accompanied by impedance variations across the thyroid area. Therefore, changes to fold vibration caused by physical damage arising from malignant disease and associated therapeutic irradiation are potentially reflected in changes to the impedance signature. Trans-larynx impedance changes can be detected during phonation via the EGG [10], which correlates well with actual vocal fold vibration and, when expertly measured,

Methodology

Eighty-nine male volunteers provided the reference standard for this study. Each subject was connected to a Laryngograph machine and asked to phonate the sustained vowel /i/ for up to 4 seconds. The Laryngograph outputs EGG impedance waveforms, which were digitised at a sampling rate of 20 kHz to produce 16-bit floating point values. The resultant data files, excluding four compromised files, were subjected to ApEn complexity analysis using software written in IDL from Research Systems

Results

Fig. 3 shows the ApEn complexity distribution for the healthy male normals reported by Moore et al. [13]. The bimodal nature of these data was tested by Gaussian mixtures model fitting using maximum likelihood [18]. They concluded (p < 0.001) that two normal groups G1 and G2 existed, characterised by complexity values 0.340 (±0.035) and 0.183 (±0.057) with relative weights 62 and 38%, respectively. Members of G1 exhibited strong FHN–fPSD features whilst those in G2 were weak, especially in the

Discussion

Of the 48 cancer cases considered, ApEn analysis indicated that two-thirds would develop improved vocal fold functionality 1 year after radiotherapy and could be objectively classified as normal (i.e. well within the G1 and G2 reference standards). Only one-quarter of cases would be below normal voicing bounds and distinctly pathological.

Fig. 4 demonstrates that patients assigned a less aberrant pre-treatment category by SALT perceptual analysis have improved ApEn post-treatment. This takes the

Conclusion

ApEn complexity analysis of the collective spectral pattern derived from trans-larynx impedance time series has allowed the recovery pattern of vocal fold functionality and voicing in male radiotherapy cancer cases to be examined. Using a single objective parameter to quantify the collective spectral pattern of vowel phonation, the majority of radiotherapy patients are seen to recover to levels of normal ApEn seen in the general, healthy male population. Many patients recover to at least the

References (19)

  • A. Fourcin

    Electrolaryngographic assessment of vocal fold function

    J. Phon.

    (1986)
  • Cancer-Stats Incidence, CRUK, March...
  • A. Forastiere

    Concurrent chemotherapy and radiotherapy for organ preservation in advanced laryngeal cancer

    N. Engl. J. Med.

    (2003)
  • M.S. Benninger et al.

    Factors associated with recurrence & voice quality following radiation therapy for T1 & T2 glottic carcinomas

    Laryngoscope

    (1994)
  • D. Hoyt et al.

    The effect of head & neck radiation therapy on voice quality

    Laryngoscope

    (1992)
  • C.J. Moore et al.

    Computerised quantification & 3D-visualisation of voice quality changes following radiotherapy for carcinoma of the larynx

  • J.G. Spector et al.

    Stage I (T1, M0, N0) squamous cell carcinoma of the laryngeal glottis: therapeutic results & voice preservation

    Head Neck

    (1999)
  • I. Verdonck-De Leeuw et al.

    The effect of radiotherapy on various acoustical, clinical and perceptual pitch measures

  • R. Baken et al.

    Voice measurement: is more better?

    Logoped. Phoniatr. Vocol.

    (1997)
There are more references available in the full text version of this article.

Cited by (0)

View full text