Inverse Filtering Based Feature for Analysis of Vowel Nasalization

Jyotishi, Debasish; Dandapat, Samarendra

doi:10.1007/978-3-030-34872-4_50

Inverse Filtering Based Feature for Analysis of Vowel Nasalization

Debasish Jyotishi¹⁴ &
Samarendra Dandapat¹⁴

Conference paper
First Online: 25 November 2019

1174 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11942))

Abstract

Vowel nasalization is present in almost every Indic languages. Detection of vowel nasalization can enhance the accuracy of Automatic Speech Recognition (ASR) systems designed for Indian languages. It also provides significant clinical information about the vocal tract. In pursuit of developing some acoustic parameters for detection of nasalized vowels, most researchers have extensively analyzed its spectral domain characteristics. In this work, we have used an inverse filtering based technique to develop a novel feature, which represents the amount of nasalization present in a vowel. The invariability of nasal filter for different nasalized vowels and addition of oral and nasal speech after radiation has been exploited to find out this feature. As the feature gives information about the amount of nasalization, this can be used for detection of vowel nasalization as well as for clinical purposes. Statistical analysis of the feature has been done in this work. The statistical analysis shows that the feature has good separability for oral vowels and nasalized vowels.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Nasal sounds are produced when the glottal wave passes through the nasal cavity. The passage of glottal wave through the nasal cavity is controlled by the velum. When we intend to utter a nasal sound, the velum lowers and allows the glottal wave to pass through the nasal cavity [10]. The percentage of glottal wave passed through the nasal cavity determines the percentage of nasalization. The nasal sounds can be broadly categorized into two categories. First types are nasal murmur or nasal consonants, e.g. /m/ and /n/, which are produced by decoupling oral tract. And the second types are nasalized vowels or nasalized semi-vowels, which are produced by coupling of both oral tract and nasal tract. In Indian languages, one can deliberately utter nasalized vowels with the help of ‘Matra’, which are present in scripts. This is called phonemic nasalization. While in English language vowel nasalization is occurred mostly due to co-articulatory nasalization. Co-articulatory nasalization is the phenomenon in which the velum raises beforehand, in anticipation of nasal consonants and thus makes an oral vowel nasalized one. Sometimes due to the presence of nasal consonants before an oral vowel, the velum remains open for some moments. This also contributes to the cause of Co-articulatory nasalization. Another kind of nasalization is functional nasalization, which occurs due to functional disorder of the velopharyngeal mechanism.

Nasalized vowels contribute to the vocabulary of almost every language. And there are words like dot and don’t, which differed by the introduction of vowel nasalization. But difficulty in detecting vowel nasalization makes it a challenging task for ASR systems. Pruthi [9] has shown that accuracy of a Hidden Markov Model (HMM) based ASR system decreases if nasalized vowels are not detected. So detecting vowel nasalization is important for improving ASR system performance. Nasalized vowels can be considered as vowels having a higher degree of nasalization. And the degree of nasalization contains significant clinical information as well as information about speech intelligibility.

Many researchers have studied spectral domain properties and proposed acoustic parameters for nasalized vowels. Fant in his work showed that due to nasalization there is a decrease in amplitude of 1st formant and increase in its bandwidth [2]. House and Stevens [5] observed a spectral prominence around 1000 Hz and reduction of 2nd formant. Effects of nasalization on vowels /aaa/, /ooo/ and /uuu/ are studied by Fujimura and Lindqvist [3]. They observed the movement of 1st formant towards higher frequency and a pair of pole-zero are being introduced near 1st and 3rd formant. Glass and Zue had done extensive statistical analysis on spectral domain characteristics of nasalized vowels and proposed six features for automatic detection of nasalized vowels [4]. Chen had found out the difference of first formant and an extra peak to be a promising feature of nasality [1]. This property was exploited by Vijaylaxmi et al. for detection of hypernasality [11]. They have used a modified group delay based approach to resolve the first formant and extra resonance that manifest in hypernasal speech due to nasalization [8]. In their paper, they have reported that the proposed feature has limitation in case of nasalized vowel detection in healthy speakers’ speech. Pruthi had analysed 37 acoustic parameters from the existing literature and selected nine knowledge based acoustic parameters for detection of nasalized vowels [9].

In this work, we have proposed an inverse filtering based feature which accounts for amount of nasalisation present in a vowel. The rest of the paper is organised as follows. Section 2 presents database description. In Sect. 3 inverse filtering based feature is proposed. In Sect. 4 analysis on different nasalised vowels is done. Section 5 summarises with the findings of the analysis.

2 Database Description

In this study, speech data from 15 speakers have been collected. Speech data are collected for vowels /e/, /u/, /i/, their nasalized counterparts i.e. /en/, /un/, /in/ respectively and the word ‘summer’. The word ‘summer’ contains the nasal consonant /m/ [11]. The nasal consonant part is manually marked for all the speech files. All the recordings are done in a speech recording studio. So, the data are free from any background noises. Speech recordings are done by using Audacity software. All the speech data are collected at 48000 Hz sampling frequency. However, as information contained in speech signal above 5000 Hz is least, data in this work are resampled at 11025 Hz.

3 Inverse Filtering Based Feature

The nasalized vowels are the addition of oral sounds and nasal sounds. For different nasalized vowels, the oral filter changes its characteristics, while the nasal filter remains invariant [7]. And this invariant nasal filter has similar characteristics to the filter which produces nasal murmur [7]. So, for different nasalized vowels only one nasal filter can be modeled. The nasal filter can be estimated from nasal murmur sound. The coupled oral and nasal tract can be modeled as in Fig. 1.

This understanding of our speech production system lets us model the speech sound as,

$$\begin{aligned} S(\omega )&=k \times G(\omega ) \times N(\omega )+(1-k) \times G(\omega ) \times O(\omega )\nonumber \\&=G(\omega ) \times N(\omega ) \times (k+(1-k) \times \frac{O(\omega )}{N(\omega )}) \end{aligned}$$

(1)

In Eq. 1, ‘k’ represents fraction of glottal wave passed through the invariant nasal filter. $N(\omega )$ and $O(\omega )$ represents nasal filter and oral filter, respectively. And $G(\omega )$ and $S(\omega )$ represents glottal wave and speech signal, respectively.

Let’s consider the nasal filter as,

$$\begin{aligned} N(\omega )=\frac{\prod _{z=1}^{Z} N(\omega -\omega _z)}{\prod _{p=1}^{P} N(\omega -\omega _j)} \end{aligned}$$

(2)

Putting the above value in Eq. 1 we will get,

$$\begin{aligned} S(\omega )=G(\omega ) \times N(\omega ) \times (k+(1-k)\nonumber \\ \times O(\omega ) \times \frac{\prod _{p=1}^{P} N(\omega -\omega _i)}{\prod _{z=1}^{Z} N(\omega -\omega _j)})\nonumber \\ \implies S(\omega ) \times N(\omega )^{-1}=G(\omega ) \times (k+(1-k)\nonumber \\ \times O(\omega ) \times \frac{\prod _{p=1}^{P} N(\omega -\omega _i)}{\prod _{z=1}^{Z} N(\omega -\omega _j)}) \end{aligned}$$

(3)

Now, if we will evaluate Eq. 3 on a nasal pole, which doesn’t have any oral pole nearby, then we will get,

$$\begin{aligned} S(\omega _p) \times N(\omega _p)^{-1}=G(\omega _p) \times (k)\nonumber \\ \implies k=\frac{S(\omega _p) \times N(\omega _p)^{-1}}{G(\omega _p)} \end{aligned}$$

(4)

In Eq. 4, ‘k’ represents the amount of nasalisation.

Equation 4 suggests that, if we have speech signal ($S(\omega )$), glottal wave ($G(\omega )$) and a mathematical model of nasal filter ($N(\omega )$) then we can find out amount of nasalisation present in the corresponding speech signal. However, in any speech application, we will be having $S(\omega )$ and hence $G(\omega )$. The only extra information needed is the person specific nasal filter.

In [6] we have shown that first three formants of nasal murmur occurs around 250 Hz, 1250 Hz and 2500 Hz. In [10] it is shown that /i/ and /e/ doesn’t have any formant near 1250 Hz. So to find out the value of ‘k’ in case of /i/ and /e/, we will use the pole corresponding to formant location 1250 Hz. And in case of vowel /u/, we will be using the pole near 2500 Hz. The expression of ‘k’ is evaluated on the pole-zero circle rather than evaluating it on the pole location.

The assumptions taken in this study are,

i
Effect of evaluating the value of ‘k’ on a unit circle instead of evaluating it on the pole location is negligible. This assumption is taken as the pole location of the nasal filter is very near to the unit circle.
ii
There is no nasal zero present near the pole location chosen. Also the zeros of oral filters, if any, are also not taken into account in this study.

4 Analysis of Nasalised Vowels

Information needed to find out the value of ‘k’ are person specific nasal filter ($N(\omega )$), speech signal ($S(\omega )$) and glottal wave ($G(\omega )$). The person specific invariant nasal filter is estimated from nasal murmur, using LP analysis. LP coefficients of 12th order LP model are estimated, for each 20 ms windowed segment of nasal murmur. From the LP coefficients, poles for all frames are estimated and they are averaged to get a desired invariant nasal filter for each person. In this analysis $G(\omega )$ is taken as residual of an 12th order LP filter of the speech signal. In Fig. 2 pole-zero plot of nasal filter of a person is shown. The cross marks show the location of poles on the pole-zero plot. The pole locations marked in green color represents the locations corresponding to the estimated invariant nasal filter.

Value of ‘k’ is found out at five frequency locations and averaged to minimize any spurious peak that may arise due to division of residual signal. The five frequency locations are, the selected frequency and two lower and upper adjacent frequencies with differences of 5 Hz. Values of ‘k’ for vowel /i/, /e/ and /u/ are calculated and their box plots are also obtained. Figures 3, 4 and 5 correspond to the box plot of vowel /i/, /e/ and /u/ respectively.

From the box plots, it is observed that the value of ‘k’ for nasalised vowel is higher compared to their oral counterparts. It is also observed that the value of ’k’ is within the range of 0 to 1 as desired. The box plot also shows that the proposed feature has high discriminatory capability, which is also validated using F-ratios. In Table 1 the median values of ‘k’ are tabulated. It is to be noted that for oral vowel case, the value of ‘k’ is non-zero. The possible reasoning for this can be the approximations that we have taken and also due to vibrations of velum during the utterance of oral vowels.

Table 1. Median values of ‘k’

Full size table

Table 2. ANOVA values

Full size table

F-ratios and p values of ‘k’ are obtained using one-way ANOVA (Analysis Of Variance) for different vowels. ANOVA suggests whether different groups belong to the same distribution or they have come from different distributions. The small value of ‘p’ and high value of ‘F-ratio’ of two groups signify that the two groups have come from different distributions. From Table 2 it is observed that the F-ratios are high and p values are very low. This shows that oral vowels and nasalised vowels are highly discriminable for values of ‘k’.

5 Conclusions

In this study, we have proposed a simple inverse filtering based technique to find out a feature which accounts for the amount of nasalisation present in a vowel. Nasalized vowels differ from oral vowels by containing more amount of nasalisation. So this feature becomes useful for detection of nasalised vowels. Statistical analysis of the feature has shown that this feature gives values which are well separable for nasalised vowels and oral vowels. As the mathematical basis of this feature is degree of nasalisation, a good correlation of this feature may be found out with the perceptual score.

References

Chen, M.Y.: Acoustic parameters of nasalized vowels in hearing-impaired and normal-hearing speakers. J. Acoust. Soc. Am. 98(5), 2443–2453 (1995)
Article Google Scholar
Fant, G.: Acoustic theory of speech production (1960)
Google Scholar
Fujimura, O., Lindqvist, J.: Sweep-tone measurements of vocal-tract characteristics. J. Acoust. Soc. Am. 49(2B), 541–558 (1971)
Article Google Scholar
Glass, J., Zue, V.: Detection of nasalized vowels in American English. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on, ICASSP 1985, vol. 10, pp. 1569–1572. IEEE (1985)
Google Scholar
House, A.S., Stevens, K.N.: Analog studies of the nasalization of vowels. J. Speech Hear. Disord. 21(2), 218–232 (1956)
Article Google Scholar
Jyotishi, D., Deb, S., Abhishek, A., Dandapat, S.: Experimental analysis on effect of nasal tract on nasalised vowels. In: Tanveer, M., Pachori, R.B. (eds.) Machine Intelligence and Signal Analysis. AISC, vol. 748, pp. 727–737. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0923-6_62
Chapter Google Scholar
Jyotishi, D., Deb, S., Dandapat, S.: A novel feature for nasalised vowels and characteristic analysis of nasal filter. In: 2018 Twenty Fourth National Conference on Communications (NCC), pp. 1–5. IEEE (2018)
Google Scholar
Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing 2003, ICASSP 2003, vol. 1, pp. I-68. IEEE (2003)
Google Scholar
Pruthi, T.: Analysis, vocal-tract modeling and automatic detection of vowel nasalization. Ph.D. thesis, University of Maryland, College Park (2007)
Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall Englewood Cliffs, New Jersey (1978)
Google Scholar
Vijayalakshmi, P., Reddy, M.R., O’Shaughnessy, D.: Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4), 621–629 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Debasish Jyotishi & Samarendra Dandapat

Authors

Debasish Jyotishi
View author publications
You can also search for this author in PubMed Google Scholar
Samarendra Dandapat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debasish Jyotishi .

Editor information

Editors and Affiliations

Tezpur University, Tezpur, India
Bhabesh Deka
Indian Statistical Institute, Kolkata, India
Pradipta Maji
Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Tezpur University, Tezpur, India
Dhruba Kumar Bhattacharyya
Indian Institute of Technology Guwahati, Guwahati, India
Prabin Kumar Bora
Indian Statistical Institute, Kolkata, India
Sankar Kumar Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jyotishi, D., Dandapat, S. (2019). Inverse Filtering Based Feature for Analysis of Vowel Nasalization. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11942. Springer, Cham. https://doi.org/10.1007/978-3-030-34872-4_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-34872-4_50
Published: 25 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34871-7
Online ISBN: 978-3-030-34872-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)