Combined classification method for prosodic stress recognition in Farsi language

Gharavian, D.; Sheikhan, M.; Ghasemi, Sh. S.

doi:10.1007/s10772-018-9508-7

Combined classification method for prosodic stress recognition in Farsi language

Published: 17 April 2018

Volume 21, pages 333–341, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

D. Gharavian¹,
M. Sheikhan² &
Sh. S. Ghasemi²

236 Accesses
Explore all metrics

Abstract

Employing stress in speech can transfer more information to a listener but makes more problems in speech recognition. The first step toward stressed speech recognition is the recognition of boundaries in stressed speech. In this research, the boundaries of prosodic stress were extracted in Farsi stressed sentences. The acoustic and prosodic features were used to train hidden Markov models for stress boundaries recognition. Using fast correlation-based filter (FCBF) method, the efficient features were selected for stress recognition. The influence of different feature sets on stress boundaries recognition performance was evaluated in this study. Based on this evaluation, a combined classifier scheme was proposed. Experimental results showed that the proposed combined model improved the stress boundaries detection performance by 12% as compared to the baseline model. So, the final recognition rate of the proposed classifier was 85% for prosodic stress boundaries recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

Analysis of Breathy, Emergency and Pathological Stress Classes

Analysis of Mental Stress with Machine Learning Methods

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ananthakrishnan, A., & Narayanan, S. (2005). An automatic prosody recognizer using a coupled multi-stream acoustic model and syntactic-prosodic language model. Proceedings of the International Conference on Acoustic, Speech and Signal Processing in Montreal, Canada (pp. 269–272).
Ananthakrishnan, S., & Narayanan, S. (2008). Automatic prosodic even detection using acoustic, lexical and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16, 216–228.
Article Google Scholar
Arslan, L. M., & Hansen, J. H. L. (1997). Frequency characteristics of foreign accented speech. Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP’97), 2, in Munich (pp. 1123–1126).
Bartels, C. D., & Bilmes, J. A. (2010). Graphical models for integrating syllabic information. Computer Speech and Language, 24, 685–697.
Article Google Scholar
Bartkova, K., & Jouvet, D. (2007). On using units trained on foreign data for improved multiple accent speech recognition. Speech Communication, 49, 836–846.
Article Google Scholar
Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebiani, M. (1994). The speech database of Farsi spoken language. Proceedings of the Australian International Speech Science and Technology Conference in Sydney, Australia (pp. 826–831).
Bitouk, D., RaginiVerma, R., & AniNenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52, 613–625.
Article Google Scholar
Bortfeld, H., & Morgan, J. L. (2010). Is early word-form processing stress-full? How natural variability supports recognition. Cognitive Psychology, 60, 241–266.
Article Google Scholar
Casale, S., Russo, A., & Serrano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49, 801–810.
Article Google Scholar
Chen, K., Hasegawa-Johnson, M., & Cohen, A. (2004). An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic prosodic model. Proceedings of the International Conference on Acoustic, Speech and Signal Processing in Montreal, Canada (pp. 509–512).
Cvejic, E., Kim, J., & Davis, C. (2012). Recognizing prosody across modalities, face areas and speakers: Examining perceivers’ sensitivity to variable realizations of visual prosody. Cognition, 122, 442–453.
Article Google Scholar
Domahs, U., Klein, E., Huber, W., & Domahs, F. (2013). Good, bad and ugly word stress—fMRI evidence for foot structure driven processing of prosodic violations. Brain & Language, 125, 272–282.
Article Google Scholar
Dumouchel, P., & O’Shaughnessy, D. D. (1993). Prosody and continuous speech recognition. Proceedings of the European Conference on Speech Communication and Technology in Berlin, Germany.
Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5, 1531–1555.
MathSciNet MATH Google Scholar
Gallwitz, F., Niemann, H., No¨, thE., and Warnke., V. (2002). Integrated recognition of words and prosodic phrase boundaries. Speech Communication, 36, 81–95.
Article MATH Google Scholar
Gharavian, D. (2004). Prosody in Farsi language and its use in recognition of intonation and speech, Ph.D. Thesis, Elec. Eng. Dept., Amirkabir University, Tehran (In Farsi).
Gharavian, D., & Ahadi, S. M. (2003). Statistical evaluation of the influence of stress on pitch frequency and phoneme durations in Farsi language. 8th European Conference on Speech Communication and Technology in Geneva.
Gharavian, D., & Ahadi, S. M. (2004a). Evaluation of the effect of stress on formants in Farsi vowels. International Conference on Acoustics, Speech, and Signal Processing in Montreal.
Gharavian, D., & Ahadi, S. M. (2004b). Use of formants in stressed and unstressed continuous speech recognition. 8th International Conference on Spoken Language Processing in Jeju Island.
Gharavian, D., & Ahadi, S. M. (2008). Stressed speech recognition using a warped frequency scale. IEICE Electronic Express, 5, 187–191.
Article Google Scholar
Gharavian, D., Sheikhan, M., & Ashoftedel, F. (2013). Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Computing and Applications, 22, 1181–1191.
Article Google Scholar
Gharavian, D., Sheikhan, M., Nazerieh, A. R., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21, 2115–2126.
Article Google Scholar
He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6, 139–146.
Article Google Scholar
Kat, L. W., & Fung, P. (1999). Fast accented identification and accented speech recognition. Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP’99), 1, in Phoenix, AZ (pp. 221–224).
Kirchhoff, K., Fink, G. A., & Sagerer, G. (2002). Combining acoustic and articulatory feature information for robust speech recognition. Speech Communication, 37, 303 – 39.
Article MATH Google Scholar
Kompe, R., Kiessling, A., Niemann, H., No¨th, E., Schukat-Talamazzini, E. G., Zottman, A., & Batliner, A. (1995). Prosodic scoring of word hypothesis graphs. Proceedings of the European Conference on Speech Communication and Technology in Madrid, (pp. 1333–1336).
Kuijk, D. V., Heuvel, H. V. D., & Boves L. (1996). Using lexical stress in continuous speech recognition for Dutch. Proceeding of the International Conference on Spoken Language Processing (ICSLP’96), 3, in Philadelphia, PA (1736–1739).
McCandless, S. S. (1974). An algorithm for formant extraction using linear prediction spectra. IEEE Transactions on Acoustics, Speech and Signal Processing, 2, 135–141.
Article Google Scholar
Medan, Y., Yair, E., & Chazan, D. (1991). Super resolution pitch determination of speech signals. IEEE Trans. Signal Processing, 39(1), 40–48.
Article Google Scholar
Narayana, L., & Kopparapu, S. K. (2009). On the use of stress information in speech for speaker recognition. Proceedings of the IEEE Region 10 Conference (TENCON’09) in Singapore (pp. 1–4).
Ni, C., Liu, W., & Bo, X. B. (2012). From English pitch accent detection to Mandarin stress detection, where is the difference? Computer Speech and Language, 26, 127–148.
Article Google Scholar
Patil, S. A., & Hansen, J. H. L. (2010). The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification. Speech Communication, 52, 327–340.
Article Google Scholar
Santen, J. P. H., Prud’hommeaux, E. T., & Black, L. M. (2009). Automated assessment of prosody production. Speech Communication, 51, 1082–1097.
Article Google Scholar
ShiroOjima, A., & Hagiwara, H. (2011). An event-related potential investigation of lexical pitch-accent processing in auditory Japanese. Brain Research, 1385, 217–228.
Article Google Scholar
Shue, Y.-L., Shattuck-Hufnagel, S. S., Iseli, M., Jun, S.-A., Veilleux, N., & Alwan, A. (2010). On the acoustic correlates of high and low nuclear pitch accents in American English. Speech Communication, 52, 106–122.
Article Google Scholar
Theera-Umpon, N., Chansareewittaya, S., & Auephanwiriyakul, S. (2011). Phoneme and tonal accent recognition for Thai speech. Expert Systems with Applications, 38, 13254–13259.
Article Google Scholar
Tomas, B., Maletic, M., & Raguz, Z. (2007). Determination and evaluation pitch harmonics parameters with emotions classifications. Proceedings of the International Conference on Telecommunications and Computer Networks (SOFTCOM 2007) in Split-Dubrovnik (pp. 1–5).
Vazirnezhad, B., Almasganj, F., & Ahadi, S. M. (2009). Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Computer Speech and Language, 23, 1–24.
Article Google Scholar
Vicsi, K., & Szasza´k, G. (2010). Using prosody to improve automatic speech recognition. Speech Communication, 52, 413–426.
Article Google Scholar
Wightman, C. W., & Ostendorf, M. (1994). Automatic labeling of prosodic patterns. IEEE Transactions on Audio and Speech Processing, 2, 469–481.
Article Google Scholar
Wu, T., Duchateau, J., Wu, T., Martens, J.-P., & Compernolle, D. V. (2010). Feature subset selection for improved native accent identification. Speech Communication, 52, 83–98.
Article Google Scholar
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2002). The HTK Book. Revised for HTK Version 3.2. Retrieved from http://htk.eng.cam.ac.uk/.
Zhang, A. Y., You, H., & Ni, C. J. (2010). Mandarin stress detection using syllable-based acoustic and syntactic features. Proceedings of the International Conference on Audio Language and Image Processing (ICALIP’10) in Shanghai (pp. 494–498).
Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (1998). Classification of speech under stress based on feature derived from the nonlinear Teager energy operator. Proceedings of the International Conference on Acoustic, Speech and Signal Processing, 1, in Seattle, WA (pp. 549–552)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Shahid Beheshti University, Tehran, Iran
D. Gharavian
Department of Electrical Engineering, Islamic Azad University, South Tehran Branch, Tehran, Iran
M. Sheikhan & Sh. S. Ghasemi

Authors

D. Gharavian
View author publications
You can also search for this author inPubMed Google Scholar
M. Sheikhan
View author publications
You can also search for this author inPubMed Google Scholar
Sh. S. Ghasemi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to D. Gharavian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gharavian, D., Sheikhan, M. & Ghasemi, S.S. Combined classification method for prosodic stress recognition in Farsi language. Int J Speech Technol 21, 333–341 (2018). https://doi.org/10.1007/s10772-018-9508-7

Download citation

Received: 30 March 2017
Accepted: 09 April 2018
Published: 17 April 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10772-018-9508-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combined classification method for prosodic stress recognition in Farsi language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

Analysis of Breathy, Emergency and Pathological Stress Classes

Analysis of Mental Stress with Machine Learning Methods

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now