Abstract
In this paper, we present a novel approach for extracting emotional information along with the state of intoxication. Conventional methods use features for identifying the state of intoxication or emotion recognition distinctly. In this work, we extract the efficient features from Alcohol Language Corpus for alcohol state detection and Berlin dataset for emotional behavior. The features extracted are fused after the feature extraction. Through the proposed approach, we can extract information of the driver, such as drunken state and emotional state at the same time. The paper deals with the driver state classification, whether he/she is alcoholic, non-alcoholic, and also their emotional behavior, such as happy, anger, sad, fear, neutral, etc. from speech signals. The main application of the work is to safeguard a person’s life who is a daily user of vehicles and alert him from accidental prone situations. We have used classifiers such as Support Vector Machine, K Nearest Neighbor, Random Forest, Gradient Boosting, and Extremely Randomized Trees. The outcome is to detect the emotion and intoxication state of the driver.
Similar content being viewed by others
References
Anne, K. R., Kuchibhotla, S., & Vankayalapati, H. D. (2015). Acoustic modeling for emotion recognition. New York: Springer.
Bernstein, J. P., Mendez, B. J., Sun, P., Liu, Y., & Shang, Y. (2017). Using deep learning for alcohol consumption recognition. In 2017 14th IEEE annual consumer communications & networking conference (CCNC) (pp. 1020–1021).
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., et al. (2004). Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on multimodal interfaces (pp. 205–211).
Cernak, M., Asaei, A., & Hyafil, A. (2018). Cognitive speech coding: Examining the impact of cognitive speech processing on speech compression. IEEE Signal Processing Magazine, 35(3), 97–109.
Davletcharova, A., Sugathan, S., Abraham, B., & James, A. P. (2015). Detection and analysis of emotion from speech signals. Procedia Computer Science, 58, 91–96.
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 3, pp. 1970–1973).
De Naurois, C. J., Bourdin, C., Bougard, C., & Vercher, J.-L. (2018). Adapting artificial neural networks to a specific driver enhances detection and prediction of drowsiness. Accident Analysis & Prevention, 121, 118–128.
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Emerich, S., Lupu, E., & Apatean, A. (2009). Emotions recognition by speech and facial expressions analysis. In 2009 17th European signal processing conference (pp. 1617–1621).
Ittichaichareon, C., Suksri, S., & Yingthawornsuk, T. (2012). Speech recognition using MFCC. In International conference on computer graphics, simulation and modeling (pp. 135–138).
Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676–680).
Kuchibhotla, S., Vankayalapati, H., Vaddi, R., & Anne, K. R. (2014). A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology, 17(4), 401–408.
Kuchibhotla, S., Vankayalapati, H. D., & Anne, K. R. (2016). An optimal two stage feature selection for speech emotion recognition using acoustic features. International Journal of Speech Technology, 19(4), 657–667.
Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In Eighth European conference on speech communication and technology.
Lee, C. M., Narayanan, S. S., et al. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.
Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2002). Combining acoustic and language information for emotion recognition. In Seventh international conference on spoken language processing.
Lee, V., & Wagner, H. (2002). The effect of social presence on the facial and verbal expression of emotion and the interrelationships among emotion components. Journal of Nonverbal Behavior, 26(1), 3–25.
Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In 2005 international conference on machine learning and cybernetics (Vol. 8, pp. 4898–4901).
Lin, Y.-L., Wei, G., & Yang, K.-C. (2007). A survey of emotion recognition in speech. Journal of Circuits and Systems, 12(1), 90–98.
Lu, D., Zhang, S., Stone, P., & Chen, X. (2017). Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 6582–6588).
Madamanchi, B. S., Paladugu, S. V., Ballipalli, S. R., Kanala, D. R., & Kuchibhotla, S. (2020). Speaker state classification using machine learning techniques. In ICDSMLA 2019. Lecture notes in electrical engineering (Vol. 601, pp. 1857–1864). Singapore: Springer.
Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108.
Nwe, T. L., Wei, F. S., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01ch37239) (Vol. 1, pp. 297–301).
Petrushin, V. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of artificial neural networks in engineering (Vol. 710, p. 22).
Rashid, M., Singh, H., et al. (2019). Text to speech conversion in Punjabi language using nourish forwarding algorithm. International Journal of Information Technology,. https://doi.org/10.1007/s41870-018-0273-2.
Sainath, T. N., Weiss, R. J., Wilson, K. W., Li, B., Narayanan, A., & Variani, E. (2017). Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5), 965–979.
Schiel, F., Heinrich, C., & Barfüsser, S. (2012). Alcohol language corpus: The first public corpus of alcoholized german speech. Language Resources and Evaluation, 46(3), 503–521.
Schuller, B., Zhang, Z., Weninger, F., & Burkhardt, F. (2012). Synthesized speech for model training in cross-corpus recognition of human emotion. International Journal of Speech Technology, 15(3), 313–323.
Scott-Parker, B. (2017). Emotions, behaviour, and the adolescent driver: A literature review. Transportation Research Part F: Trafic Psychology and Behaviour, 50, 1–37.
Shete, D., Patil, S., & Patil, S. (2014). Zero crossing rate and energy of the speech signal of devanagari script. IOSR-JVSP, 4(1), 1–5.
Sun, M., Chen, Y.-N., & Rudnicky, A. I. (2017). Helpr: A framework to break the barrier across domains in spoken dialog systems. In Dialogues with social robots (pp. 257–269). New York: Springer.
Toh, A. M., Togneri, R., & Nordholm, S. (2005). Spectral entropy as speech features for speech recognition. Proceedings of PEECS, 1, 92.
Trouvain, J., & Möbius, B. (2020). Speech synthesis: Text-to-speech conversion and artificial voices. Handbook of the Changing World Language Map (pp. 3837–3851).
Unluturk, M. S., Oguz, K., & Atay, C. (2009). Emotion recognition using neural networks. In Proceedings of the 10th WSEAS international conference on neural networks, Prague, Czech republic (pp. 82–85).
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250.
Yu, F., Chang, E., Xu, Y.-Q., & Shum, H.-Y. (2001). Emotion detection from speech to enrich multimedia content. In Pacificrim conference on multimedia (pp. 550–557).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shenoi, V.V., Kuchibhotla, S. & Kotturu, P. An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithms. Int J Speech Technol 23, 625–632 (2020). https://doi.org/10.1007/s10772-020-09726-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09726-7