Skip to main content
Log in

Automatic phonetic segmentation of Hindi speech using hidden Markov model

  • Open Forum
  • Published:
AI & SOCIETY Aims and scope Submit manuscript

Abstract

In this paper, we study the performance of baseline hidden Markov model (HMM) for segmentation of speech signals. It is applied on single-speaker segmentation task, using Hindi speech database. The automatic phoneme segmentation framework evolved imitates the human phoneme segmentation process. A set of 44 Hindi phonemes were chosen for the segmentation experiment, wherein we used continuous density hidden Markov model (CDHMM) with a mixture of Gaussian distribution. The left-to-right topology with no skip states has been selected as it is effective in speech recognition due to its consistency with the natural way of articulating the spoken words. This system accepts speech utterances along with their orthographic “transcriptions” and generates segmentation information of the speech. This corpus was used to develop context-independent hidden Markov models (HMMs) for each of the Hindi phonemes. The system was trained using numerous sentences that are relevant to provide information to the passengers of the Metro Rail. The system was validated against a few manually segmented speech utterances. The evaluation of the experiments shows that the best performance is obtained by using a combination of two Gaussians mixtures and five HMM states. A category-wise phoneme error analysis has been performed, and the performance of the phonetic segmentation has been reported. The modeling of HMMs has been implemented using Microsoft Visual Studio 2005 (C++), and the system is designed to work on Windows operating system. The goal of this study is automatic segmentation of speech at phonetic level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Arora K, Arora S, Verma K, Agrawal SS (2004) Automatic extraction of phonetically rich sentences from large Text Corpus of Indian Languages. 8th Int’l conference on spoken language, Processing, ICC, Jeju, Jeju Island, Korea, Oct 4–8, Interspeech 2004

  • Boersma P, Weenik D (2001) Praat: a system for doing phonetics by computer. (http://www.praat.org/)

  • Brugnara F, Falavigna D, Omologo M (1993) Automatic segmentation and labeling of speech based on hidden markov models. Speech Commun 12(4):357–370

    Article  Google Scholar 

  • Chou F-C, Tseng C-Y, Lee L-S (2002) A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. IEEE Trans Speech Audio Process 10(7):481–494

    Google Scholar 

  • Demuynck K, Laureys T (2002) A comparison of different approaches to automatic speech segmentation. Proceedings of international conference on text, speech and dialogue pp. 277–284

  • Forney JD (1978) The Viterbi Algorithm. Proc of IEEE 3:268–278

  • Molau S, Pitz M, Schliiter R, Nay H (2001) Computing Mel-Frequency Cepstral Coefficients on the Power Spectrum. In Proc in Int’l Conf, IEEE 2001 (ICASSP)

  • Mporas I, Lazaridis A, Ganchev T, Fakotakis N (2009) “Using Hybrid HMM—based speech segmentation to improve synthetic speech quality” 2009 13th Panhellenic Conference on Informatics

  • Niewiadomy D, Pelikant A (2008) Implementation of MFCC vector generation in classification context. J Appl Comput Sci. http://edu.ics.p.lodz.pl/file.php/38/2-2008/niewiadomy-2-2008.pdf

  • Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  • Sethy A, Narayanan S (2002) Refined speech segmentation for concatenative speech synthesis. ICSLP, pp. 149–152. http://sail.usc.edu/publications/autoseg_final.pdf

  • Ting C-M, Salleh S-H, Tan T-S, Ariff AK (2007) Automatic phonetic segmentation of malay speech database. In ICICS, IEEE

  • Toledano T, Gomez LAH, Grande LV (2003) Automatic phonetic segmentation. IEEE Trans Speech Audio Process 11(6), Nov 2003

  • van Santen JPH, Spoart J (1990) High accuracy automatic Segmentation. In: Proceedings of European conference on speech communication and technology

  • Veeravalli AG et al (2005) A tutorial on using hidden markov models for honeme recognition. In: System Theory, 2005.SSST’05 Proceedings of the thirty—Seventh Southeastern Symposium

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Archana Balyan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balyan, A., Agrawal, S.S. & Dev, A. Automatic phonetic segmentation of Hindi speech using hidden Markov model. AI & Soc 27, 543–549 (2012). https://doi.org/10.1007/s00146-012-0386-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00146-012-0386-2

Keywords

Navigation