Skip to main content
Log in

An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we present a novel approach for extracting emotional information along with the state of intoxication. Conventional methods use features for identifying the state of intoxication or emotion recognition distinctly. In this work, we extract the efficient features from Alcohol Language Corpus for alcohol state detection and Berlin dataset for emotional behavior. The features extracted are fused after the feature extraction. Through the proposed approach, we can extract information of the driver, such as drunken state and emotional state at the same time. The paper deals with the driver state classification, whether he/she is alcoholic, non-alcoholic, and also their emotional behavior, such as happy, anger, sad, fear, neutral, etc. from speech signals. The main application of the work is to safeguard a person’s life who is a daily user of vehicles and alert him from accidental prone situations. We have used classifiers such as Support Vector Machine, K Nearest Neighbor, Random Forest, Gradient Boosting, and Extremely Randomized Trees. The outcome is to detect the emotion and intoxication state of the driver.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Anne, K. R., Kuchibhotla, S., & Vankayalapati, H. D. (2015). Acoustic modeling for emotion recognition. New York: Springer.

    Book  Google Scholar 

  • Bernstein, J. P., Mendez, B. J., Sun, P., Liu, Y., & Shang, Y. (2017). Using deep learning for alcohol consumption recognition. In 2017 14th IEEE annual consumer communications & networking conference (CCNC) (pp. 1020–1021).

  • Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., et al. (2004). Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on multimodal interfaces (pp. 205–211).

  • Cernak, M., Asaei, A., & Hyafil, A. (2018). Cognitive speech coding: Examining the impact of cognitive speech processing on speech compression. IEEE Signal Processing Magazine, 35(3), 97–109.

    Article  Google Scholar 

  • Davletcharova, A., Sugathan, S., Abraham, B., & James, A. P. (2015). Detection and analysis of emotion from speech signals. Procedia Computer Science, 58, 91–96.

    Article  Google Scholar 

  • Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 3, pp. 1970–1973).

  • De Naurois, C. J., Bourdin, C., Bougard, C., & Vercher, J.-L. (2018). Adapting artificial neural networks to a specific driver enhances detection and prediction of drowsiness. Accident Analysis & Prevention, 121, 118–128.

    Article  Google Scholar 

  • El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.

    Article  Google Scholar 

  • Emerich, S., Lupu, E., & Apatean, A. (2009). Emotions recognition by speech and facial expressions analysis. In 2009 17th European signal processing conference (pp. 1617–1621).

  • Ittichaichareon, C., Suksri, S., & Yingthawornsuk, T. (2012). Speech recognition using MFCC. In International conference on computer graphics, simulation and modeling (pp. 135–138).

  • Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676–680).

  • Kuchibhotla, S., Vankayalapati, H., Vaddi, R., & Anne, K. R. (2014). A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology, 17(4), 401–408.

    Article  Google Scholar 

  • Kuchibhotla, S., Vankayalapati, H. D., & Anne, K. R. (2016). An optimal two stage feature selection for speech emotion recognition using acoustic features. International Journal of Speech Technology, 19(4), 657–667.

    Article  Google Scholar 

  • Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In Eighth European conference on speech communication and technology.

  • Lee, C. M., Narayanan, S. S., et al. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.

    Article  Google Scholar 

  • Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2002). Combining acoustic and language information for emotion recognition. In Seventh international conference on spoken language processing.

  • Lee, V., & Wagner, H. (2002). The effect of social presence on the facial and verbal expression of emotion and the interrelationships among emotion components. Journal of Nonverbal Behavior, 26(1), 3–25.

    Article  Google Scholar 

  • Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In 2005 international conference on machine learning and cybernetics (Vol. 8, pp. 4898–4901).

  • Lin, Y.-L., Wei, G., & Yang, K.-C. (2007). A survey of emotion recognition in speech. Journal of Circuits and Systems, 12(1), 90–98.

    Google Scholar 

  • Lu, D., Zhang, S., Stone, P., & Chen, X. (2017). Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 6582–6588).

  • Madamanchi, B. S., Paladugu, S. V., Ballipalli, S. R., Kanala, D. R., & Kuchibhotla, S. (2020). Speaker state classification using machine learning techniques. In ICDSMLA 2019. Lecture notes in electrical engineering (Vol. 601, pp. 1857–1864). Singapore: Springer.

  • Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108.

    Article  Google Scholar 

  • Nwe, T. L., Wei, F. S., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01ch37239) (Vol. 1, pp. 297–301).

  • Petrushin, V. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of artificial neural networks in engineering (Vol. 710, p. 22).

  • Rashid, M., Singh, H., et al. (2019). Text to speech conversion in Punjabi language using nourish forwarding algorithm. International Journal of Information Technology,. https://doi.org/10.1007/s41870-018-0273-2.

    Article  Google Scholar 

  • Sainath, T. N., Weiss, R. J., Wilson, K. W., Li, B., Narayanan, A., & Variani, E. (2017). Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5), 965–979.

    Article  Google Scholar 

  • Schiel, F., Heinrich, C., & Barfüsser, S. (2012). Alcohol language corpus: The first public corpus of alcoholized german speech. Language Resources and Evaluation, 46(3), 503–521.

    Article  Google Scholar 

  • Schuller, B., Zhang, Z., Weninger, F., & Burkhardt, F. (2012). Synthesized speech for model training in cross-corpus recognition of human emotion. International Journal of Speech Technology, 15(3), 313–323.

    Article  Google Scholar 

  • Scott-Parker, B. (2017). Emotions, behaviour, and the adolescent driver: A literature review. Transportation Research Part F: Trafic Psychology and Behaviour, 50, 1–37.

    Article  Google Scholar 

  • Shete, D., Patil, S., & Patil, S. (2014). Zero crossing rate and energy of the speech signal of devanagari script. IOSR-JVSP, 4(1), 1–5.

    Article  Google Scholar 

  • Sun, M., Chen, Y.-N., & Rudnicky, A. I. (2017). Helpr: A framework to break the barrier across domains in spoken dialog systems. In Dialogues with social robots (pp. 257–269). New York: Springer.

  • Toh, A. M., Togneri, R., & Nordholm, S. (2005). Spectral entropy as speech features for speech recognition. Proceedings of PEECS, 1, 92.

    Google Scholar 

  • Trouvain, J., & Möbius, B. (2020). Speech synthesis: Text-to-speech conversion and artificial voices. Handbook of the Changing World Language Map (pp. 3837–3851).

  • Unluturk, M. S., Oguz, K., & Atay, C. (2009). Emotion recognition using neural networks. In Proceedings of the 10th WSEAS international conference on neural networks, Prague, Czech republic (pp. 82–85).

  • Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250.

    Article  Google Scholar 

  • Yu, F., Chang, E., Xu, Y.-Q., & Shum, H.-Y. (2001). Emotion detection from speech to enrich multimedia content. In Pacificrim conference on multimedia (pp. 550–557).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Viswanath Shenoi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shenoi, V.V., Kuchibhotla, S. & Kotturu, P. An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms. Int J Speech Technol 23, 625–632 (2020). https://doi.org/10.1007/s10772-020-09726-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09726-7

Keywords

Navigation