An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms

Shenoi, V. Viswanath; Kuchibhotla, Swarna; Kotturu, Prasuna

doi:10.1007/s10772-020-09726-7

An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms

Published: 30 June 2020

Volume 23, pages 625–632, (2020)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

V. Viswanath Shenoi ORCID: orcid.org/0000-0001-5789-9465¹,
Swarna Kuchibhotla² &
Prasuna Kotturu²

186 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we present a novel approach for extracting emotional information along with the state of intoxication. Conventional methods use features for identifying the state of intoxication or emotion recognition distinctly. In this work, we extract the efficient features from Alcohol Language Corpus for alcohol state detection and Berlin dataset for emotional behavior. The features extracted are fused after the feature extraction. Through the proposed approach, we can extract information of the driver, such as drunken state and emotional state at the same time. The paper deals with the driver state classification, whether he/she is alcoholic, non-alcoholic, and also their emotional behavior, such as happy, anger, sad, fear, neutral, etc. from speech signals. The main application of the work is to safeguard a person’s life who is a daily user of vehicles and alert him from accidental prone situations. We have used classifiers such as Support Vector Machine, K Nearest Neighbor, Random Forest, Gradient Boosting, and Extremely Randomized Trees. The outcome is to detect the emotion and intoxication state of the driver.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Ninad Mehendale

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

Survey on SVM and their application in image classification

Article 11 January 2018

Mayank Arya Chandra & S. S. Bedi

References

Anne, K. R., Kuchibhotla, S., & Vankayalapati, H. D. (2015). Acoustic modeling for emotion recognition. New York: Springer.
Book Google Scholar
Bernstein, J. P., Mendez, B. J., Sun, P., Liu, Y., & Shang, Y. (2017). Using deep learning for alcohol consumption recognition. In 2017 14th IEEE annual consumer communications & networking conference (CCNC) (pp. 1020–1021).
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., et al. (2004). Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on multimodal interfaces (pp. 205–211).
Cernak, M., Asaei, A., & Hyafil, A. (2018). Cognitive speech coding: Examining the impact of cognitive speech processing on speech compression. IEEE Signal Processing Magazine, 35(3), 97–109.
Article Google Scholar
Davletcharova, A., Sugathan, S., Abraham, B., & James, A. P. (2015). Detection and analysis of emotion from speech signals. Procedia Computer Science, 58, 91–96.
Article Google Scholar
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 3, pp. 1970–1973).
De Naurois, C. J., Bourdin, C., Bougard, C., & Vercher, J.-L. (2018). Adapting artificial neural networks to a specific driver enhances detection and prediction of drowsiness. Accident Analysis & Prevention, 121, 118–128.
Article Google Scholar
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Article Google Scholar
Emerich, S., Lupu, E., & Apatean, A. (2009). Emotions recognition by speech and facial expressions analysis. In 2009 17th European signal processing conference (pp. 1617–1621).
Ittichaichareon, C., Suksri, S., & Yingthawornsuk, T. (2012). Speech recognition using MFCC. In International conference on computer graphics, simulation and modeling (pp. 135–138).
Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676–680).
Kuchibhotla, S., Vankayalapati, H., Vaddi, R., & Anne, K. R. (2014). A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology, 17(4), 401–408.
Article Google Scholar
Kuchibhotla, S., Vankayalapati, H. D., & Anne, K. R. (2016). An optimal two stage feature selection for speech emotion recognition using acoustic features. International Journal of Speech Technology, 19(4), 657–667.
Article Google Scholar
Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In Eighth European conference on speech communication and technology.
Lee, C. M., Narayanan, S. S., et al. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.
Article Google Scholar
Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2002). Combining acoustic and language information for emotion recognition. In Seventh international conference on spoken language processing.
Lee, V., & Wagner, H. (2002). The effect of social presence on the facial and verbal expression of emotion and the interrelationships among emotion components. Journal of Nonverbal Behavior, 26(1), 3–25.
Article Google Scholar
Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In 2005 international conference on machine learning and cybernetics (Vol. 8, pp. 4898–4901).
Lin, Y.-L., Wei, G., & Yang, K.-C. (2007). A survey of emotion recognition in speech. Journal of Circuits and Systems, 12(1), 90–98.
Google Scholar
Lu, D., Zhang, S., Stone, P., & Chen, X. (2017). Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 6582–6588).
Madamanchi, B. S., Paladugu, S. V., Ballipalli, S. R., Kanala, D. R., & Kuchibhotla, S. (2020). Speaker state classification using machine learning techniques. In ICDSMLA 2019. Lecture notes in electrical engineering (Vol. 601, pp. 1857–1864). Singapore: Springer.
Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108.
Article Google Scholar
Nwe, T. L., Wei, F. S., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01ch37239) (Vol. 1, pp. 297–301).
Petrushin, V. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of artificial neural networks in engineering (Vol. 710, p. 22).
Rashid, M., Singh, H., et al. (2019). Text to speech conversion in Punjabi language using nourish forwarding algorithm. International Journal of Information Technology,. https://doi.org/10.1007/s41870-018-0273-2.
Article Google Scholar
Sainath, T. N., Weiss, R. J., Wilson, K. W., Li, B., Narayanan, A., & Variani, E. (2017). Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5), 965–979.
Article Google Scholar
Schiel, F., Heinrich, C., & Barfüsser, S. (2012). Alcohol language corpus: The first public corpus of alcoholized german speech. Language Resources and Evaluation, 46(3), 503–521.
Article Google Scholar
Schuller, B., Zhang, Z., Weninger, F., & Burkhardt, F. (2012). Synthesized speech for model training in cross-corpus recognition of human emotion. International Journal of Speech Technology, 15(3), 313–323.
Article Google Scholar
Scott-Parker, B. (2017). Emotions, behaviour, and the adolescent driver: A literature review. Transportation Research Part F: Trafic Psychology and Behaviour, 50, 1–37.
Article Google Scholar
Shete, D., Patil, S., & Patil, S. (2014). Zero crossing rate and energy of the speech signal of devanagari script. IOSR-JVSP, 4(1), 1–5.
Article Google Scholar
Sun, M., Chen, Y.-N., & Rudnicky, A. I. (2017). Helpr: A framework to break the barrier across domains in spoken dialog systems. In Dialogues with social robots (pp. 257–269). New York: Springer.
Toh, A. M., Togneri, R., & Nordholm, S. (2005). Spectral entropy as speech features for speech recognition. Proceedings of PEECS, 1, 92.
Google Scholar
Trouvain, J., & Möbius, B. (2020). Speech synthesis: Text-to-speech conversion and artificial voices. Handbook of the Changing World Language Map (pp. 3837–3851).
Unluturk, M. S., Oguz, K., & Atay, C. (2009). Emotion recognition using neural networks. In Proceedings of the 10th WSEAS international conference on neural networks, Prague, Czech republic (pp. 82–85).
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250.
Article Google Scholar
Yu, F., Chang, E., Xu, Y.-Q., & Shum, H.-Y. (2001). Emotion detection from speech to enrich multimedia content. In Pacificrim conference on multimedia (pp. 550–557).

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita College of Engineering and Technology, Nagercoil, Tamil Nadu, India
V. Viswanath Shenoi
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
Swarna Kuchibhotla & Prasuna Kotturu

Authors

V. Viswanath Shenoi
View author publications
You can also search for this author in PubMed Google Scholar
Swarna Kuchibhotla
View author publications
You can also search for this author in PubMed Google Scholar
Prasuna Kotturu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Viswanath Shenoi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shenoi, V.V., Kuchibhotla, S. & Kotturu, P. An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms. Int J Speech Technol 23, 625–632 (2020). https://doi.org/10.1007/s10772-020-09726-7

Download citation

Received: 21 December 2019
Accepted: 10 June 2020
Published: 30 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10772-020-09726-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Survey on SVM and their application in image classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Survey on SVM and their application in image classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation