Abstract
In this paper, we study the performance of baseline hidden Markov model (HMM) for segmentation of speech signals. It is applied on single-speaker segmentation task, using Hindi speech database. The automatic phoneme segmentation framework evolved imitates the human phoneme segmentation process. A set of 44 Hindi phonemes were chosen for the segmentation experiment, wherein we used continuous density hidden Markov model (CDHMM) with a mixture of Gaussian distribution. The left-to-right topology with no skip states has been selected as it is effective in speech recognition due to its consistency with the natural way of articulating the spoken words. This system accepts speech utterances along with their orthographic “transcriptions” and generates segmentation information of the speech. This corpus was used to develop context-independent hidden Markov models (HMMs) for each of the Hindi phonemes. The system was trained using numerous sentences that are relevant to provide information to the passengers of the Metro Rail. The system was validated against a few manually segmented speech utterances. The evaluation of the experiments shows that the best performance is obtained by using a combination of two Gaussians mixtures and five HMM states. A category-wise phoneme error analysis has been performed, and the performance of the phonetic segmentation has been reported. The modeling of HMMs has been implemented using Microsoft Visual Studio 2005 (C++), and the system is designed to work on Windows operating system. The goal of this study is automatic segmentation of speech at phonetic level.
Similar content being viewed by others
References
Arora K, Arora S, Verma K, Agrawal SS (2004) Automatic extraction of phonetically rich sentences from large Text Corpus of Indian Languages. 8th Int’l conference on spoken language, Processing, ICC, Jeju, Jeju Island, Korea, Oct 4–8, Interspeech 2004
Boersma P, Weenik D (2001) Praat: a system for doing phonetics by computer. (http://www.praat.org/)
Brugnara F, Falavigna D, Omologo M (1993) Automatic segmentation and labeling of speech based on hidden markov models. Speech Commun 12(4):357–370
Chou F-C, Tseng C-Y, Lee L-S (2002) A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. IEEE Trans Speech Audio Process 10(7):481–494
Demuynck K, Laureys T (2002) A comparison of different approaches to automatic speech segmentation. Proceedings of international conference on text, speech and dialogue pp. 277–284
Forney JD (1978) The Viterbi Algorithm. Proc of IEEE 3:268–278
Molau S, Pitz M, Schliiter R, Nay H (2001) Computing Mel-Frequency Cepstral Coefficients on the Power Spectrum. In Proc in Int’l Conf, IEEE 2001 (ICASSP)
Mporas I, Lazaridis A, Ganchev T, Fakotakis N (2009) “Using Hybrid HMM—based speech segmentation to improve synthetic speech quality” 2009 13th Panhellenic Conference on Informatics
Niewiadomy D, Pelikant A (2008) Implementation of MFCC vector generation in classification context. J Appl Comput Sci. http://edu.ics.p.lodz.pl/file.php/38/2-2008/niewiadomy-2-2008.pdf
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Sethy A, Narayanan S (2002) Refined speech segmentation for concatenative speech synthesis. ICSLP, pp. 149–152. http://sail.usc.edu/publications/autoseg_final.pdf
Ting C-M, Salleh S-H, Tan T-S, Ariff AK (2007) Automatic phonetic segmentation of malay speech database. In ICICS, IEEE
Toledano T, Gomez LAH, Grande LV (2003) Automatic phonetic segmentation. IEEE Trans Speech Audio Process 11(6), Nov 2003
van Santen JPH, Spoart J (1990) High accuracy automatic Segmentation. In: Proceedings of European conference on speech communication and technology
Veeravalli AG et al (2005) A tutorial on using hidden markov models for honeme recognition. In: System Theory, 2005.SSST’05 Proceedings of the thirty—Seventh Southeastern Symposium
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Balyan, A., Agrawal, S.S. & Dev, A. Automatic phonetic segmentation of Hindi speech using hidden Markov model. AI & Soc 27, 543–549 (2012). https://doi.org/10.1007/s00146-012-0386-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00146-012-0386-2