Automatic detection of glottal stop in cleft palate speech

https://doi.org/10.1016/j.bspc.2017.07.027Get rights and content

Highlights

  • First work to automatically detect glottal stop in cleft palate speech.

  • Applying an extensive size of cleft palate speech database.

  • An automatic initial consonants and finals segmentation method is proposed.

  • Four acoustic features are applied to identify glottal stop

Abstract

The speech therapy is essential during the whole treatment of Cleft Palate (CP) over many years. The automatic evaluation of CP speech could provide an effective assistant diagnosis. The glottal stop is a typical compensatory articulation error in CP speech. It is produced by adducting the vocal folds as a substitution during the pronunciation of oral pressure consonants. The existence of glottal stop has a great impact on speech intelligibility. In this work, an automatic glottal stop detection system is proposed. The CP speech database is collected by the Hospital of Stomatology, Sichuan University, which has the largest number of CP patients in China. This database includes extensive CP speech samples annotated by speech-language pathologists at the phoneme level. The vocabulary of this database includes all the initial consonants in Mandarin. The automatic initials and finals segmentation method is proposed firstly. Then, for the pressure initial consonants, four types of acoustic features are extracted: Mel-frequency cepstral coefficients, formants, Gammatone filtering energy feature, wavelet packet energy and Shannon entropy features. The features are processed using a k-nearest neighbor classifier. The k-fold cross validation is used to calculate the average detection accuracy, which reaches up to 93.5% using Gammatone filtering energy feature.

Introduction

Cleft Lip (CL) and Cleft Palate (CP) are the most common congenital craniofacial deformity. The occurrence rate of cleft lip and cleft palate is 1.82‰ in China [1] and around 0.2‰  2.3‰ all over the world [2], [3]. Compared with cleft lip, the primary impact of cleft palate is speaking problem, which is mainly caused by the defects and deformities of palate bones and soft tissues. Cleft palate can be corrected by multiple surgeries. The treatment of cleft palate includes not only the morphological and structural reconstruction, but also the functional restore [4], [5]. After the first stage of cleft palate repair surgery, there are still around 30% to 50% patients suffering from speech disorders. Speech therapy will be very beneficial to help solve CP speech problems. The consistent speech evaluation provided by the professional speech pathologists are essential during the whole treatment of cleft palate over many years. Currently, the assessment of CP speech is performed by professional speech-language pathologists. It strongly depends on their subjective judgment and experiences. The computer aided automatic evaluation system of CP speech provides an objective and effective diagnosis to both speech pathologists and patients.

There are two types of typical CP speech performances: resonance disorder and articulation disorder [6]. The most common resonance disorder in CP speech is hypernasality, which happens at the vowels only. Several researches have been done to automatically detect the existence of hypernasality [7], [8], [9] and classify four hypernasality grades in CP speech [10], [11]. Articulation disorder is another category of CP speech clinical characterization. The number of types of articulation disorders is far more than that of resonance disorders, which makes the comprehensive analysis of articulation disorders much more complicated. The typical types of misarticulation include consonant omission, consonant substitution, consonant distortion, compensatory articulation, and so on. Thereinto, the most typical compensatory errors includes glottal stop, pharyngeal fricative, pharyngeal stop, posterior nasal fricative, mid-dorsum palatal stop and so on.

The glottal stop is a typical articulation disorder among patients with cleft palate. Published literature in both English and Chinese suggest that the occurrence rate of the glottal stop in cleft patients is 60-90% [12]. The existence of glottal stop will make the speech unintelligible. Speech therapy provided by speech-language pathologists is necessary to help CP patient retrieve right pronunciation habit. Furthermore, the presence of glottal stop is a sign to judge whether the palatal surgery is effective. Further surgery might be required to help CP patient retrieve normal anatomic structure [12], [13], [14], [15], [16], [17].

The glottal stop compensatory articulation occurs when there is inadequate intraoral air pressure for the production of pressure consonant due to the existence of VeloPharyngeal Dysfunction (VPD). The CP speakers learn to move anterior sound production position posteriorly where there is a greater amount of air pressure. The speech with the glottal stop is perceived as a brief choking or popping sound in the throat [18]. The glottal stop is typical among patients with cleft palate in other languages as well. Although there are diversities among different languages, the diagnosis, causes, and treatment of glottal stop in CP speech are the same [19], [20], [21], [22], [23], [24], [25].

Several clinicians and speech pathologists at the Hospital of Stomatology have examined the acoustic characteristics of glottal stop, mainly through the analysis of spectrograms. Those researches conducted by clinicians mainly focus on the qualitative description of the glottal stop in CP speech. Li’s work [26] analyzes the Mandarin initials k and t, recorded by 20 boys with cleft palate and 20 normal peers. Their experiment results indicate that the existence of glottal stop will result in the emergence of spikes, changes of Voice Onset Time (VOT), and formant transition in the spectrograms. Zhu’s work [27] shows that the VOT of glottal stop becomes shorter than that of normal speech, through the observation of spectrograms for initial consonants t, s and d. Chen’s research [28] indicates that the formants and VOTs of CP glottal stop speech are different from that of normal speech, through the observation of spectrograms for the initials z, c, s, j, q and x.

Currently, very rare work has been done to identify glottal stop in CP speech automatically. Xiao et al. [29] investigate the automatic glottal stop detection system in Mandarin CP speech. However, that work tests one initial d only, using a limited size of speech samples. In Maier's work [30], MFCC is applied to detect glottal stop in German. The speech samples are recorded from 4 children. The size of tested data is limited, and it contains 32 phonemes and 31 words only. In this work, an automatic glottal stop detection method in CP speech is proposed, for all the pressure consonants in Mandarin, using an extensive size of CP speech database. This paper is organized as follows: Section 2 describes the CP speech database; Section 3 illustrates the glottal stop detection method; the experiment results are listed in Section 4; Section 5 includes the conclusions and discussions.

Section snippets

Cleft Palate Speech (CPS) database

The major bottleneck in the field of CP speech signal processing is the collection of CP speech database and the annotation of recordings by professional speech-language pathologists. In this work, the CP speech data are collected by the Department of Cleft Lip and Palate, Hospital of Stomatology, Sichuan University, which has the largest number of CLP patients in China. The detailed information of CPS database, including the description of participates, vocabulary list and annotation, is

Automatic detection of glottal stop

The glottal stop occurs during the pronunciation of initial consonants only. This database contains a total of 2371 Chinese syllables. One syllable consists of two parts: the initial and the final. In this work, the initials and finals are segmented automatically firstly. Then, the automatic glottal stop detection method is implemented at the initials only.

Experiments and results

The GSS speech database contains two types of speech recordings: normal and glottal stop speech. For each Chinese syllable, the automatic initial/final segmentation method is implemented. The automatic segmentation results are compared with manual segmentation results. The absolute values of time error between the automatic initial/final segmentation boundaries and manual segmentation results (golden standard) are calculated.

The current initial/final segmentation methods are implemented in

Conclusions and discussions

An automatic glottal stop detection system is proposed in this work. The automatic initial/final segmentation method is implemented firstly. The current initial/final segmentation methods usually take P18∼35 as the segmentation accuracies, which are around 84–93% [33], [34], [35], [36], [37], [38], [39], [40]. In this work, the segmentation accuracy is 92.3% taking P20 for all the syllables. Furthermore, the majority of current researches are based on speech database with limited size. The

Acknowledgement

This work is supported by the National Natural Science Foundation of China 61503264.

References (56)

  • A. Harding et al.

    Characteristics of cleft palate speech

    Eur. J. Disord. Commun.

    (1996)
  • K.J. Golding-Kushner

    Therapy techniques for cleft palate speech & Related disorder

    (2001)
  • J.E. Losee et al.

    Comprehensive Cleft Care

    (2008)
  • D. Rah et al.

    A noninvasive estimation of hypernasality using a linear preditive model

    Annals of Biomedical Engineering

    (2001)
  • P. Vijayalakshmi et al.

    Selective pole modification-based technique for the analysis and detection of hypernasality

  • P. Vijayalakshmi et al.

    Acoustic analysis and detection of hypernasality using a group delay function

    IEEE Transactions on Biomedical Engineering

    (2007)
  • L. He et al.

    Automatic evaluation of hypernasality and consonant misarticulation in cleft palate speech

    IEEE Signal Processing Letters

    (2014)
  • L. He et al.

    Automatic evaluation of Hypernasality Based on a Cleft Palate Speech Database

    Journal of Medical Systems

    (2015)
  • G. Henningsson et al.

    Universal parameters for reporting speech outcomes in individuals with cleft palate

    Cleft Palate Craniofac J.

    (2008)
  • J.E. Losee et al.

    Comprehensive cleft care

    (2008)
  • Y. Heng et al.

    A preliminary study on the articulation of older patients with cleft palate

    West China Journal of Stomatology

    (2013)
  • F.1 Peterson

    The clinician’s Guide to treating cleft palate speech Philadelphia

    (2006)
  • A. Ysunza et al.

    Speech outcome and maxillary growth in patients with unilateral complete cleft lip/palate operated on at 6 versus 12 months of age

    Plast. Reconstr. Surg.

    (1998)
  • D.P. Kuehn

    Speech evaluation and treatment for patients with cleft palate

    American Journal of Speech-Language Pathology

    (2003)
  • F.E. Aydınlı et al.

    Investigating the Effects of Glottal Stop Productions on Voice in Children With Cleft Palate Using Multidimensional Voice Assessment Methods

    Journal of Voice

    (2016)
  • M.C. Pamplona et al.

    A Study of Strategies for Treating Compensatory Articulation in Patients with Cleft Palate

    J.Maxillofac. Oral Surg.

    (2012)
  • H. Rina et al.

    Activation patterns in the auditory association area involved in glottal stop perception,

    Journal of Oral Biosciences

    (2013)
  • V. Ferbach-Hecker et al.

    The production of stops in VCV sequences in children with a cleft palate an acoustic study

    Proceedings of ISSP 2008

    (2008)
  • Cited by (13)

    • Parkinson's disease and cleft lip and palate of pathological speech diagnosis using deep convolutional neural networks evolved by IPWOA

      2022, Applied Acoustics
      Citation Excerpt :

      Morphological changes can cause a variety of speech issues, including hypernasality. As much as 90 percent of patients with a cleft lip and palate (CLP) are affected by this condition [10]. Neurological disorders like Parkinson’s disease (PD) cause hypokinetic dysarthria, which affects about 90 percent of tolerants, in addition to morphology-based diseases [11].

    • Diagnose Parkinson's disease and cleft lip and palate using deep convolutional neural networks evolved by IP-based chimp optimization algorithm

      2022, Biomedical Signal Processing and Control
      Citation Excerpt :

      Hypernasality is a morphologically induced speech problem. It occurs in over 90% of people with cleft lip and palate (CLP) [8]. Hypokinetic dysarthria is a common symptom of neurological disorders including Parkinson’s disease (PD) [9].

    • Modification of misarticulated fricative /s/ in cleft lip and palate speech

      2021, Biomedical Signal Processing and Control
      Citation Excerpt :

      A glottal stop may be perceived as a brief choking or popping sound in the throat [41]. In literature, the occurrence rate of the glottal stop in the individuals with CLP is approximately reported to be 60–90% [42,43]. The silence region marked with a red rectangle in Fig. 2(c) and (d) corresponds to the glottal stop substituted /s/ in FVFV word opposed to the noise-like frication and high-frequency spectral energy marked with a red rectangle in Fig. 2(a) and (b).

    • Research progress of computer science in cleft lip and palate speech therapy

      2022, Journal of Prevention and Treatment for Stomatological Diseases
    View all citing articles on Scopus
    View full text