Skip to main content
Log in

Combined classification method for prosodic stress recognition in Farsi language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Employing stress in speech can transfer more information to a listener but makes more problems in speech recognition. The first step toward stressed speech recognition is the recognition of boundaries in stressed speech. In this research, the boundaries of prosodic stress were extracted in Farsi stressed sentences. The acoustic and prosodic features were used to train hidden Markov models for stress boundaries recognition. Using fast correlation-based filter (FCBF) method, the efficient features were selected for stress recognition. The influence of different feature sets on stress boundaries recognition performance was evaluated in this study. Based on this evaluation, a combined classifier scheme was proposed. Experimental results showed that the proposed combined model improved the stress boundaries detection performance by 12% as compared to the baseline model. So, the final recognition rate of the proposed classifier was 85% for prosodic stress boundaries recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Ananthakrishnan, A., & Narayanan, S. (2005). An automatic prosody recognizer using a coupled multi-stream acoustic model and syntactic-prosodic language model. Proceedings of the International Conference on Acoustic, Speech and Signal Processing in Montreal, Canada (pp. 269–272).

  • Ananthakrishnan, S., & Narayanan, S. (2008). Automatic prosodic even detection using acoustic, lexical and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16, 216–228.

    Article  Google Scholar 

  • Arslan, L. M., & Hansen, J. H. L. (1997). Frequency characteristics of foreign accented speech. Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP’97), 2, in Munich (pp. 1123–1126).

  • Bartels, C. D., & Bilmes, J. A. (2010). Graphical models for integrating syllabic information. Computer Speech and Language, 24, 685–697.

    Article  Google Scholar 

  • Bartkova, K., & Jouvet, D. (2007). On using units trained on foreign data for improved multiple accent speech recognition. Speech Communication, 49, 836–846.

    Article  Google Scholar 

  • Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebiani, M. (1994). The speech database of Farsi spoken language. Proceedings of the Australian International Speech Science and Technology Conference in Sydney, Australia (pp. 826–831).

  • Bitouk, D., RaginiVerma, R., & AniNenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52, 613–625.

    Article  Google Scholar 

  • Bortfeld, H., & Morgan, J. L. (2010). Is early word-form processing stress-full? How natural variability supports recognition. Cognitive Psychology, 60, 241–266.

    Article  Google Scholar 

  • Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49, 801–810.

    Article  Google Scholar 

  • Chen, K., Hasegawa-Johnson, M., & Cohen, A. (2004). An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic prosodic model. Proceedings of the International Conference on Acoustic, Speech and Signal Processing in Montreal, Canada (pp. 509–512).

  • Cvejic, E., Kim, J., & Davis, C. (2012). Recognizing prosody across modalities, face areas and speakers: Examining perceivers’ sensitivity to variable realizations of visual prosody. Cognition, 122, 442–453.

    Article  Google Scholar 

  • Domahs, U., Klein, E., Huber, W., & Domahs, F. (2013). Good, bad and ugly word stress—fMRI evidence for foot structure driven processing of prosodic violations. Brain & Language, 125, 272–282.

    Article  Google Scholar 

  • Dumouchel, P., & O’Shaughnessy, D. D. (1993). Prosody and continuous speech recognition. Proceedings of the European Conference on Speech Communication and Technology in Berlin, Germany.

  • Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5, 1531–1555.

    MathSciNet  MATH  Google Scholar 

  • Gallwitz, F., Niemann, H., No¨, thE., and Warnke., V. (2002). Integrated recognition of words and prosodic phrase boundaries. Speech Communication, 36, 81–95.

    Article  MATH  Google Scholar 

  • Gharavian, D. (2004). Prosody in Farsi language and its use in recognition of intonation and speech, Ph.D. Thesis, Elec. Eng. Dept., Amirkabir University, Tehran (In Farsi).

  • Gharavian, D., & Ahadi, S. M. (2003). Statistical evaluation of the influence of stress on pitch frequency and phoneme durations in Farsi language. 8th European Conference on Speech Communication and Technology in Geneva.

  • Gharavian, D., & Ahadi, S. M. (2004a). Evaluation of the effect of stress on formants in Farsi vowels. International Conference on Acoustics, Speech, and Signal Processing in Montreal.

  • Gharavian, D., & Ahadi, S. M. (2004b). Use of formants in stressed and unstressed continuous speech recognition. 8th International Conference on Spoken Language Processing in Jeju Island.

  • Gharavian, D., & Ahadi, S. M. (2008). Stressed speech recognition using a warped frequency scale. IEICE Electronic Express, 5, 187–191.

    Article  Google Scholar 

  • Gharavian, D., Sheikhan, M., & Ashoftedel, F. (2013). Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Computing and Applications, 22, 1181–1191.

    Article  Google Scholar 

  • Gharavian, D., Sheikhan, M., Nazerieh, A. R., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21, 2115–2126.

    Article  Google Scholar 

  • He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6, 139–146.

    Article  Google Scholar 

  • Kat, L. W., & Fung, P. (1999). Fast accented identification and accented speech recognition. Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP’99), 1, in Phoenix, AZ (pp. 221–224).

  • Kirchhoff, K., Fink, G. A., & Sagerer, G. (2002). Combining acoustic and articulatory feature information for robust speech recognition. Speech Communication, 37, 303 – 39.

    Article  MATH  Google Scholar 

  • Kompe, R., Kiessling, A., Niemann, H., No¨th, E., Schukat-Talamazzini, E. G., Zottman, A., & Batliner, A. (1995). Prosodic scoring of word hypothesis graphs. Proceedings of the European Conference on Speech Communication and Technology in Madrid, (pp. 1333–1336).

  • Kuijk, D. V., Heuvel, H. V. D., & Boves L. (1996). Using lexical stress in continuous speech recognition for Dutch. Proceeding of the International Conference on Spoken Language Processing (ICSLP’96), 3, in Philadelphia, PA (1736–1739).

  • McCandless, S. S. (1974). An algorithm for formant extraction using linear prediction spectra. IEEE Transactions on Acoustics, Speech and Signal Processing, 2, 135–141.

    Article  Google Scholar 

  • Medan, Y., Yair, E., & Chazan, D. (1991). Super resolution pitch determination of speech signals. IEEE Trans. Signal Processing, 39(1), 40–48.

    Article  Google Scholar 

  • Narayana, L., & Kopparapu, S. K. (2009). On the use of stress information in speech for speaker recognition. Proceedings of the IEEE Region 10 Conference (TENCON’09) in Singapore (pp. 1–4).

  • Ni, C., Liu, W., & Bo, X. B. (2012). From English pitch accent detection to Mandarin stress detection, where is the difference? Computer Speech and Language, 26, 127–148.

    Article  Google Scholar 

  • Patil, S. A., & Hansen, J. H. L. (2010). The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification. Speech Communication, 52, 327–340.

    Article  Google Scholar 

  • Santen, J. P. H., Prud’hommeaux, E. T., & Black, L. M. (2009). Automated assessment of prosody production. Speech Communication, 51, 1082–1097.

    Article  Google Scholar 

  • ShiroOjima, A., & Hagiwara, H. (2011). An event-related potential investigation of lexical pitch-accent processing in auditory Japanese. Brain Research, 1385, 217–228.

    Article  Google Scholar 

  • Shue, Y.-L., Shattuck-Hufnagel, S. S., Iseli, M., Jun, S.-A., Veilleux, N., & Alwan, A. (2010). On the acoustic correlates of high and low nuclear pitch accents in American English. Speech Communication, 52, 106–122.

    Article  Google Scholar 

  • Theera-Umpon, N., Chansareewittaya, S., & Auephanwiriyakul, S. (2011). Phoneme and tonal accent recognition for Thai speech. Expert Systems with Applications, 38, 13254–13259.

    Article  Google Scholar 

  • Tomas, B., Maletic, M., & Raguz, Z. (2007). Determination and evaluation pitch harmonics parameters with emotions classifications. Proceedings of the International Conference on Telecommunications and Computer Networks (SOFTCOM 2007) in Split-Dubrovnik (pp. 1–5).

  • Vazirnezhad, B., Almasganj, F., & Ahadi, S. M. (2009). Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Computer Speech and Language, 23, 1–24.

    Article  Google Scholar 

  • Vicsi, K., & Szasza´k, G. (2010). Using prosody to improve automatic speech recognition. Speech Communication, 52, 413–426.

    Article  Google Scholar 

  • Wightman, C. W., & Ostendorf, M. (1994). Automatic labeling of prosodic patterns. IEEE Transactions on Audio and Speech Processing, 2, 469–481.

    Article  Google Scholar 

  • Wu, T., Duchateau, J., Wu, T., Martens, J.-P., & Compernolle, D. V. (2010). Feature subset selection for improved native accent identification. Speech Communication, 52, 83–98.

    Article  Google Scholar 

  • Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2002). The HTK Book. Revised for HTK Version 3.2. Retrieved from http://htk.eng.cam.ac.uk/.

  • Zhang, A. Y., You, H., & Ni, C. J. (2010). Mandarin stress detection using syllable-based acoustic and syntactic features. Proceedings of the International Conference on Audio Language and Image Processing (ICALIP’10) in Shanghai (pp. 494–498).

  • Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (1998). Classification of speech under stress based on feature derived from the nonlinear Teager energy operator. Proceedings of the International Conference on Acoustic, Speech and Signal Processing, 1, in Seattle, WA (pp. 549–552)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Gharavian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gharavian, D., Sheikhan, M. & Ghasemi, S.S. Combined classification method for prosodic stress recognition in Farsi language. Int J Speech Technol 21, 333–341 (2018). https://doi.org/10.1007/s10772-018-9508-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9508-7

Keywords