Automatic detection of glottal stop in cleft palate speech
Introduction
Cleft Lip (CL) and Cleft Palate (CP) are the most common congenital craniofacial deformity. The occurrence rate of cleft lip and cleft palate is 1.82‰ in China [1] and around 0.2‰ ∼ 2.3‰ all over the world [2], [3]. Compared with cleft lip, the primary impact of cleft palate is speaking problem, which is mainly caused by the defects and deformities of palate bones and soft tissues. Cleft palate can be corrected by multiple surgeries. The treatment of cleft palate includes not only the morphological and structural reconstruction, but also the functional restore [4], [5]. After the first stage of cleft palate repair surgery, there are still around 30% to 50% patients suffering from speech disorders. Speech therapy will be very beneficial to help solve CP speech problems. The consistent speech evaluation provided by the professional speech pathologists are essential during the whole treatment of cleft palate over many years. Currently, the assessment of CP speech is performed by professional speech-language pathologists. It strongly depends on their subjective judgment and experiences. The computer aided automatic evaluation system of CP speech provides an objective and effective diagnosis to both speech pathologists and patients.
There are two types of typical CP speech performances: resonance disorder and articulation disorder [6]. The most common resonance disorder in CP speech is hypernasality, which happens at the vowels only. Several researches have been done to automatically detect the existence of hypernasality [7], [8], [9] and classify four hypernasality grades in CP speech [10], [11]. Articulation disorder is another category of CP speech clinical characterization. The number of types of articulation disorders is far more than that of resonance disorders, which makes the comprehensive analysis of articulation disorders much more complicated. The typical types of misarticulation include consonant omission, consonant substitution, consonant distortion, compensatory articulation, and so on. Thereinto, the most typical compensatory errors includes glottal stop, pharyngeal fricative, pharyngeal stop, posterior nasal fricative, mid-dorsum palatal stop and so on.
The glottal stop is a typical articulation disorder among patients with cleft palate. Published literature in both English and Chinese suggest that the occurrence rate of the glottal stop in cleft patients is 60-90% [12]. The existence of glottal stop will make the speech unintelligible. Speech therapy provided by speech-language pathologists is necessary to help CP patient retrieve right pronunciation habit. Furthermore, the presence of glottal stop is a sign to judge whether the palatal surgery is effective. Further surgery might be required to help CP patient retrieve normal anatomic structure [12], [13], [14], [15], [16], [17].
The glottal stop compensatory articulation occurs when there is inadequate intraoral air pressure for the production of pressure consonant due to the existence of VeloPharyngeal Dysfunction (VPD). The CP speakers learn to move anterior sound production position posteriorly where there is a greater amount of air pressure. The speech with the glottal stop is perceived as a brief choking or popping sound in the throat [18]. The glottal stop is typical among patients with cleft palate in other languages as well. Although there are diversities among different languages, the diagnosis, causes, and treatment of glottal stop in CP speech are the same [19], [20], [21], [22], [23], [24], [25].
Several clinicians and speech pathologists at the Hospital of Stomatology have examined the acoustic characteristics of glottal stop, mainly through the analysis of spectrograms. Those researches conducted by clinicians mainly focus on the qualitative description of the glottal stop in CP speech. Li’s work [26] analyzes the Mandarin initials k and t, recorded by 20 boys with cleft palate and 20 normal peers. Their experiment results indicate that the existence of glottal stop will result in the emergence of spikes, changes of Voice Onset Time (VOT), and formant transition in the spectrograms. Zhu’s work [27] shows that the VOT of glottal stop becomes shorter than that of normal speech, through the observation of spectrograms for initial consonants t, s and d. Chen’s research [28] indicates that the formants and VOTs of CP glottal stop speech are different from that of normal speech, through the observation of spectrograms for the initials z, c, s, j, q and x.
Currently, very rare work has been done to identify glottal stop in CP speech automatically. Xiao et al. [29] investigate the automatic glottal stop detection system in Mandarin CP speech. However, that work tests one initial d only, using a limited size of speech samples. In Maier's work [30], MFCC is applied to detect glottal stop in German. The speech samples are recorded from 4 children. The size of tested data is limited, and it contains 32 phonemes and 31 words only. In this work, an automatic glottal stop detection method in CP speech is proposed, for all the pressure consonants in Mandarin, using an extensive size of CP speech database. This paper is organized as follows: Section 2 describes the CP speech database; Section 3 illustrates the glottal stop detection method; the experiment results are listed in Section 4; Section 5 includes the conclusions and discussions.
Section snippets
Cleft Palate Speech (CPS) database
The major bottleneck in the field of CP speech signal processing is the collection of CP speech database and the annotation of recordings by professional speech-language pathologists. In this work, the CP speech data are collected by the Department of Cleft Lip and Palate, Hospital of Stomatology, Sichuan University, which has the largest number of CLP patients in China. The detailed information of CPS database, including the description of participates, vocabulary list and annotation, is
Automatic detection of glottal stop
The glottal stop occurs during the pronunciation of initial consonants only. This database contains a total of 2371 Chinese syllables. One syllable consists of two parts: the initial and the final. In this work, the initials and finals are segmented automatically firstly. Then, the automatic glottal stop detection method is implemented at the initials only.
Experiments and results
The GSS speech database contains two types of speech recordings: normal and glottal stop speech. For each Chinese syllable, the automatic initial/final segmentation method is implemented. The automatic segmentation results are compared with manual segmentation results. The absolute values of time error between the automatic initial/final segmentation boundaries and manual segmentation results (golden standard) are calculated.
The current initial/final segmentation methods are implemented in
Conclusions and discussions
An automatic glottal stop detection system is proposed in this work. The automatic initial/final segmentation method is implemented firstly. The current initial/final segmentation methods usually take P18∼35 as the segmentation accuracies, which are around 84–93% [33], [34], [35], [36], [37], [38], [39], [40]. In this work, the segmentation accuracy is 92.3% taking P20 for all the syllables. Furthermore, the majority of current researches are based on speech database with limited size. The
Acknowledgement
This work is supported by the National Natural Science Foundation of China 61503264.
References (56)
- et al.
Evaluation of VPI-assessment with videofluoroscopy and nasoendoscopy
Br. J. Plast. Surg.
(2005) - et al.
A comparative trial of two modalities of speech intervention for compensatory articulation in cleft palate children, phonologic approach versus articulatory approach
International Journal of Pediatric Otorhinolaryngology
(1999) - et al.
Rayleigh modeling of teager energy operated perceptual wavelet packet coefficients for enhancing noisy speech
Speech Communication
(2017) - et al.
Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions-evaluation of two methods
Speech Communication
(2016) - et al.
Speech enhancement based on AR model parameters estimation
Speech Communication
(2016) - et al.
Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency
Speech Communication
(2011) - et al.
Entropy coding of compressed feature parameters for distributed speech recognition
Speech Communication
(2010) - et al.
Assessment and Treatment of Cleft Palate Speech Beijing
(2015) - et al.
Epidemiology of oral clefts: An international perspective
- et al.
Cleft lip and palate: understanding genetic and environmental influences
Nat Rev Genet.
(2011)
Characteristics of cleft palate speech
Eur. J. Disord. Commun.
Therapy techniques for cleft palate speech & Related disorder
Comprehensive Cleft Care
A noninvasive estimation of hypernasality using a linear preditive model
Annals of Biomedical Engineering
Selective pole modification-based technique for the analysis and detection of hypernasality
Acoustic analysis and detection of hypernasality using a group delay function
IEEE Transactions on Biomedical Engineering
Automatic evaluation of hypernasality and consonant misarticulation in cleft palate speech
IEEE Signal Processing Letters
Automatic evaluation of Hypernasality Based on a Cleft Palate Speech Database
Journal of Medical Systems
Universal parameters for reporting speech outcomes in individuals with cleft palate
Cleft Palate Craniofac J.
Comprehensive cleft care
A preliminary study on the articulation of older patients with cleft palate
West China Journal of Stomatology
The clinician’s Guide to treating cleft palate speech Philadelphia
Speech outcome and maxillary growth in patients with unilateral complete cleft lip/palate operated on at 6 versus 12 months of age
Plast. Reconstr. Surg.
Speech evaluation and treatment for patients with cleft palate
American Journal of Speech-Language Pathology
Investigating the Effects of Glottal Stop Productions on Voice in Children With Cleft Palate Using Multidimensional Voice Assessment Methods
Journal of Voice
A Study of Strategies for Treating Compensatory Articulation in Patients with Cleft Palate
J.Maxillofac. Oral Surg.
Activation patterns in the auditory association area involved in glottal stop perception,
Journal of Oral Biosciences
The production of stops in VCV sequences in children with a cleft palate an acoustic study
Proceedings of ISSP 2008
Cited by (13)
Parkinson's disease and cleft lip and palate of pathological speech diagnosis using deep convolutional neural networks evolved by IPWOA
2022, Applied AcousticsCitation Excerpt :Morphological changes can cause a variety of speech issues, including hypernasality. As much as 90 percent of patients with a cleft lip and palate (CLP) are affected by this condition [10]. Neurological disorders like Parkinson’s disease (PD) cause hypokinetic dysarthria, which affects about 90 percent of tolerants, in addition to morphology-based diseases [11].
Diagnose Parkinson's disease and cleft lip and palate using deep convolutional neural networks evolved by IP-based chimp optimization algorithm
2022, Biomedical Signal Processing and ControlCitation Excerpt :Hypernasality is a morphologically induced speech problem. It occurs in over 90% of people with cleft lip and palate (CLP) [8]. Hypokinetic dysarthria is a common symptom of neurological disorders including Parkinson’s disease (PD) [9].
Modification of misarticulated fricative /s/ in cleft lip and palate speech
2021, Biomedical Signal Processing and ControlCitation Excerpt :A glottal stop may be perceived as a brief choking or popping sound in the throat [41]. In literature, the occurrence rate of the glottal stop in the individuals with CLP is approximately reported to be 60–90% [42,43]. The silence region marked with a red rectangle in Fig. 2(c) and (d) corresponds to the glottal stop substituted /s/ in FVFV word opposed to the noise-like frication and high-frequency spectral energy marked with a red rectangle in Fig. 2(a) and (b).
Speech phoneme and spectral smearing based non-invasive COVID-19 detection
2023, Frontiers in Artificial IntelligenceCompensatory Articulation Errors in Patients With Velopharyngeal Dysfunction and Palatal Anomalies
2022, Journal of Speech, Language, and Hearing ResearchResearch progress of computer science in cleft lip and palate speech therapy
2022, Journal of Prevention and Treatment for Stomatological Diseases