Skip to main content
Log in

Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This article presents a cross-lingual study for Hungarian and Finnish about the segmentation of continuous speech on word and phrasal level by examination of supra-segmental parameters. A word level segmentationer has been developed which can indicate the word boundaries with acceptable precision for both languages. The ultimate aim is to increase the robustness of speech recognition on the language modelling level by the detection of word and phrase boundaries, and thus we can significantly decrease the searching space during the decoding process. Searching space reduction is highly important in the case of agglutinative languages.

In Hungarian and in Finnish, if stress is present, this is always on the first syllable of the word stressed. Thus if stressed syllables can be detected, these must be at the beginning of the word. We have developed different algorithms based either on a rule-based or a data-driven approach. The rule-based algorithms and HMM-based methods are compared. The best results were obtained by data-driven algorithms using the time series of fundamental frequency and energy together. Syllable length was found to be much less effective, hence was discarded. By use of supra-segmental features, word boundaries can be marked with high accuracy, even if we are unable to find all of them. The method we evaluated is easily adaptable to other fixed-stress languages. To investigate this we adapted our data-driven method to the Finnish language and obtained similar results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Di Cristo. (1981). Aspects phonétiques et phonologiques des éléments prosodiques. Modeles linguistiques Tome III, 2: 24–83.

    Google Scholar 

  • Kassai, I. (1998). Fonetika. Nemzeti Tankönyvkiadó, Budapest, pp. 209–230.

  • Langlais, P. and Méloni, H. (1993). Integration of a prosodic component in an automatic speech recognition system. In 3rd European Conference on Speech Communication and Technology, Berlin, pp. 2007–2010.

  • Mandal, S., Datta, A.K., and Gupta, B. (2003). Word boundary Detection of Continuous Speech Signal for Standard Colloquial Bengali (SCB) using suprasegmental features. FRSM.

  • Peters, B. (2003). Multiple cues for phonetic phrase boundaries in German spontaneous speech. In Proceedings 15th ICPhS, ICPhS: Barcelona CA, pp. 1795–1798.

  • Roach, P. (1996). BABEL: An Eastern European multi-language database. In International Conference on Speech and Language Processing. Philadelphia. Venditti, J. and Hirschberg, J. (2003). Intonation and discourse processing. Proceedings 15th ICPhS, ICPhS: Barcelona, CA, pp. 107–114.

  • Rossi, M. (1993). A model for predicting the prosody of spontaneous speech (PPSS model). Speech Communication, 13: 87–107.

    Article  Google Scholar 

  • Salomon, A., Espy-Wilson, C.Y. and Deshmukh, O. (2004). Detection of speech landmarks. Use of temporal information. J. Acoust. Soc. Am., 115: 1296–13005.

    Article  Google Scholar 

  • Vainio, M., Altosaar, T., Karjalainen, M., Aulanko, R., and Werner, S. (1999). Neural network models for Finnish prosody. In Proceedings of ICPhS 1999. ICPhS: San Francisco, CA, pp. 2347–2350.

  • Yang, L. (2003). Duration and pauses as phrase and boundary marking indicators in speech. In Proceedings 15th ICPhS. ICPhS: Barcelona, CA, pp. 1791–1794.

  • Young, S. et al. (2002). The HTK Book (for version 3.2). Cambridge University, Cambridge.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klára Vicsi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vicsi, K., Szaszák, G. Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features. Int J Speech Technol 8, 363–370 (2005). https://doi.org/10.1007/s10772-006-8534-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-006-8534-z

Keywords

Navigation