Abstract:
This paper presents a new technique for glottal inverse filtering using a distributed model of the vocal tract. A discrete state space model has been constructed for the ...Show MoreMetadata
Abstract:
This paper presents a new technique for glottal inverse filtering using a distributed model of the vocal tract. A discrete state space model has been constructed for the speech production system by combining the concatenated tube model of the vocal tract and Liljencrants-Fant (LF) model of the glottal flow derivative waveform. An adaptive system identification technique, based on extended Kalman filtering, has been used for estimation of the states and model parameters from continuous speech. The glottal signal, represented by the LF model, is piecewise differentiable in one glottal cycle. Hence, the hybrid system has been characterized by separate models during two different modes. Multiple model estimation has been performed by switching between the two models at the mode jumps. The open phase of the glottal cycle has been considered as Mode 1; whereas, the return phase and closed phase combined has been taken as Mode 2. The starting point of Mode 1, also known as glottal opening instant, was estimated by observing formant modulation, which remains negligible during closed phase, and starts to increase at the onset of opening. The starting point of Mode 2, also known as the glottal closing instant, was computed by peak-picking from linear prediction (LP) residual signal. The proposed method estimates the glottal waveform as well as changes in flow occurring at different sections of the vocal tract during speech production. This technique has been found to be accurate and robust to variations in pitch as compared to other LP-based methods in the literature. The method also estimates the air pressure distribution at different sections of the vocal tract.
Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 24, Issue: 7, July 2016)