Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework

Saha, Tulika; Gupta, Dhawal; Saha, Sriparna; Bhattacharyya, Pushpak

doi:10.1007/s12559-019-09704-5

Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework

Published: 22 January 2020

Volume 13, pages 277–289, (2021)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Tulika Saha¹,
Dhawal Gupta¹,
Sriparna Saha¹ &
…
Pushpak Bhattacharyya¹

1113 Accesses
13 Citations
Explore all metrics

Abstract

Dialogue act classification (DAC) gives a significant insight into understanding the communicative intention of the user. Numerous machine learning (ML) and deep learning (DL) approaches have been proposed over the years in these regards for task-oriented/independent conversations in the form of texts. However, the affect of emotional state in determining the dialogue acts (DAs) has not been studied in depth in a multi-modal framework involving text, audio, and visual features. Conversations are intrinsically determined and regulated by direct, exquisite, and subtle emotions. The emotional state of a speaker has a considerable affect on its intentional or its pragmatic content. This paper thoroughly investigates the role of emotions in automatic identification of the DAs in task-independent conversations in a multi-modal framework (specifically audio and texts). A DL-based multi-tasking network for DAC and emotion recognition (ER) has been developed incorporating attention to facilitate the fusion of different modalities. An open source, benchmarked ER multi-modal dataset IEMOCAP has been manually annotated for its corresponding DAs to make it suitable for multi-task learning and further advance the research in multi-modal DAC. The proposed multi-task framework attains an improvement of 2.5% against its single-task DAC counterpart for manually annotated IEMOCAP dataset. Results as compared with several baselines establish the efficacy of the proposed approach and the importance of incorporating emotion while identifying the DAs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Automatic speech recognition: a survey

Article 10 November 2020

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Notes

The DA annotated dataset will be made publicly available for the research community.
https://keras.io/

References

Jurafsky D, Bates R, Coccaro N, Martin R, Meteer M, Ries K, Shriberg E, Stolcke A, Taylor P, Van Ess-Dykema C. 1997. Automatic detection of discourse structure for speech recognition and understanding. In: 1997 IEEE workshop on automatic speech recognition and understanding proceedings, IEEE, pp 88–95.
Stolcke A, Ries K, Coccaro N, Shriberg E, Bates R, Jurafsky D, Taylor P, Martin R, Ess-Dykema C V, Meteer M. Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational linguistics 2000;26(3):339–373.
Article Google Scholar
Verbree D, Rienks R, Heylen D. 2006. Dialogue-act tagging using smart feature selection; results on multiple corpora. In: Spoken Language Technology Workshop, 2006. IEEE, IEEE, pp 70–73.
Kalchbrenner N, Blunsom P. 2013. Recurrent convolutional neural networks for discourse compositionality. arXiv:13063584.
Papalampidi P, Iosif E, Potamianos A. 2017. Dialogue act semantic representation and classification using recurrent neural networks. SEMDIAL 2017 SaarDial, pp 104.
Liu Y, Han K, Tan Z, Lei Y. 2017. Using context information for dialog act classification in dnn framework. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2170–2178.
Ribeiro E, Ribeiro R, de Matos D M. A multilingual and multidomain study on dialog act recognition using character-level tokenization. Information 2019;10(3):94.
Article Google Scholar
DeLamater JD, Ward A. Handbook of social psychology. Berlin: Springer; 2006.
Book Google Scholar
Fleckenstein K S. Defining affect in relation to cognition: A response to susan mcleod. J Adv Comp 1991;11: 447–453.
Google Scholar
Barrett L F, Lewis M, Haviland-Jones JM. Handbook of emotions. New York: The Guilford Press; 1993.
Google Scholar
Zadeh AB, Liang PP, Poria S, Cambria E, Morency LP. 2018. Multimodal language analysis in the wild: Cmu-mosei data-set and interpretable dynamic fusion graph. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 2236–2246.
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J G. Emotion recognition in human-computer interaction. IEEE Signal Proc Mag 2001;18(1):32–80.
Article Google Scholar
Jain N, Kumar S, Kumar A, Shamsolmoali P, Zareapoor M. Hybrid deep neural networks for face emotion recognition. Pattern Recogn Lett 2018;115:101–106.
Article Google Scholar
Zhang S, Zhang S, Huang T, Gao W, Tian Q. Learning affective features with a hybrid deep model for audio–visual emotion recognition. IEEE Trans Circuits Syst Video Technol 2018;28(10):3030–3043.
Article Google Scholar
Huang C, Zaiane O, Trabelsi A, Dziri N. 2018. Automatic dialogue generation with expressed emotions. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, vol 2 (Short Papers), pp 49–54.
Zhou H, Huang M, Zhang T, Zhu X, Liu B. 2018. Emotional chatting machine: Emotional conversation generation with internal and external memory. In: 32nd AAAI conference on artificial intelligence.
Fung P, Bertero D, Xu P, Park J H, Wu C S, Madotto A. 2018. Empathetic dialog systems. In: The international conference on language resources and evaluation. European Language Resources Association.
Novielli N, Strapparava C. The role of affect analysis in dialogue act identification. IEEE Trans Affect Comput 2013;4(4):439– 451.
Article Google Scholar
Bosma W, André E. 2004. Exploiting emotions to disambiguate dialogue acts. In: Proceedings of the 9th international conference on Intelligent user interfaces, ACM, pp 85–92.
Poria S, Cambria E, Hazarika D, Mazumder N, Zadeh A, Morency LP. 2017. Multi-level multiple attentions for contextual multimodal sentiment analysis. In: 2017 IEEE international conference on data mining (ICDM), IEEE, pp 1033–1038.
Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency LP. 2017. Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 873–883.
Busso C, Bulut M, Lee C C, Kazemzadeh A, Mower E, Kim S, Chang J N, Lee S, Narayanan S S. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation 2008;42(4):335.
Article Google Scholar
Reithinger N, Klesen M. 1997. Dialogue act classification using language models. In: 5th European conference on speech communication and technology.
Stolcke A, Shriberg E, Bates R, Coccaro N, Jurafsky D, Martin R, Meteer M, Ries K, Taylor P, Van Ess-Dykema C, et al. 1998. Dialog act modeling for conversational speech. In: AAAI spring symposium on applying machine learning to discourse processing, pp 98–105.
Grau S, Sanchis E, Castro MJ, Vilar D. 2004. Dialogue act classification using a bayesian approach. In: 9th Conference Speech and Computer.
Godfrey J J, Holliman E C, McDaniel J. 1992. Switchboard: Telephone speech corpus for research and development. In: 1992 IEEE international conference on acoustics, speech, and signal processing, 1992. ICASSP-92, IEEE, vol 1, pp 517-520.
Khanpour H, Guntakandla N, Nielsen R. 2016. Dialogue act classification in domain-independent conversations using a deep recurrent neural network. In: Proceedings of COLING 2016, The 26th international conference on computational linguistics: Technical Papers, pp 2012–2021.
Lee JY, Dernoncourt F. 2016. Sequential short-text classification with recurrent and convolutional neural networks. In: Proceedings of the 2016 Conference of the North American chapter of the association for computational linguistics: Human language technologies, association for computational linguistics, pp 515–520. http://aclweb.org/anthology/N16-1062.
Kumar H, Agarwal A, Dasgupta R, Joshi S. 2018. Dialogue act sequence labeling using hierarchical encoder with crf. In: 32nd AAAI conference on artificial intelligence.
Raheja V, Tetreault J. 2019. Dialogue act classification with context-aware self-attention. arXiv:190402594.
Yu Y, Peng S, Yang GH. 2019. Modeling long-range context for concurrent dialogue acts recognition. arXiv:190900521.
Sitter S, Stein A. Modeling the illocutionary aspects of information-seeking dialogues. Inf Process Manag 1992;28(2):165–180.
Article Google Scholar
Ortega D, Li C Y, Vallejo G, Denisov P, Vu NT. 2019. Context-aware neural-based dialog act classification on automatically generated transcriptions. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7265–7269.
Saha T, Srivastava S, Firdaus M, Saha S, Ekbal A, Bhattacharyya P. 2019. Exploring machine learning and deep learning frameworks for task-oriented dialogue act classification. In: International joint conference on neural networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, pp 1–8. https://doi.org/10.1109/IJCNN.2019.8851943.
Boyer KE, Grafsgaard JF, Ha EY, Phillips R, Lester JC. 2011. An affect-enriched dialogue act classification model for task-oriented dialogue. In: Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies vol 1, Association for Computational Linguistics, pp 1190–1199.
Ihasz P L, Kryssanov V. 2018. Emotions and intentions mediated with dialogue acts. In: 2018 5th international conference on business and industrial research (ICBIR), IEEE, pp 125–130.
Cerisara C, Jafaritazehjani S, Oluokun A, Le H. 2018. Multi-task dialog act and sentiment recognition on mastodon. arXiv:180705013.
Vosoughi S, Roy D. 2016. Tweet acts: A speech act classifier for twitter. In: 10th international AAAI conference on web and social media.
Lauren P, Qu G, Yang J, Watta P, Huang G B, Lendasse A. Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. Cogn Comput 2018;10(4):625–638.
Article Google Scholar
Wang Z, Lin Z. 2019. Optimal feature selection for learning-based algorithms for sentiment classification. Cognitive Computation pp 1–11.
Sun X, Peng X, Ding S. Emotional human-machine conversation generation based on long short-term memory. Cogn Comput 2018;10(3):389–397. https://doi.org/10.1007/s12559-017-9539-4.
Article Google Scholar
Griol D, Callejas Z. Mobile conversational agents for context-aware care applications. Cogn Comput 2016;8 (2):336–356. https://doi.org/10.1007/s12559-015-9352-x.
Article Google Scholar
Rodríguez LF, Ramos F. Development of computational models of emotions for autonomous agents: A review. Cogn Comput 2014;6(3):351?-375. https://doi.org/10.1007/s12559-013-9244-x.
Article Google Scholar
Shriberg E, Dhillon R, Bhagat S, Ang J, Carvey H. The icsi meeting recorder dialog act (mrda) corpus. Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL; 2004.
Heeman P A, Allen J F. 1995. The trains 93 dialogues. Tech. rep., Rochester Univ NY Dept of Computer Science.
Anderson A H, Bader M, Bard E G, Boyle E, Doherty G, Garrod S, Isard S, Kowtko J, McAllister J, Miller J, et al. The hcrc map task corpus. Language and speech 1991;34(4):351–366.
Article Google Scholar
Jurafsky D. 1997. Switchboard swbd-damsl shallow-discourse-function annotation coders manual. Institute of Cognitive Science Technical Report.
LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 1995;3361(10):1995.
Google Scholar
Pennington J, Socher R, Manning C. 2014. Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543.
Eyben F, Wöllmer M, Schuller B. 2010. Opensmile: The munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 1459–1462.
Drugman T, Thomas M, Gudnason J, Naylor P, Dutoit T. Detection of glottal closure instants from speech signals: A quantitative review. IEEE Trans Audio, Speech, Language Process 2011;20(3):994–1006.
Article Google Scholar
Kane J, Gobl C. Wavelet maxima dispersion for breathy to tense voice discrimination. IEEE Trans Audio, Speech, Language Process 2013;21(6):1170–1179.
Article Google Scholar
Drugman T, Alwan A. 2011. Joint robust voicing detection and pitch estimation based on residual harmonics. In: 12th annual conference of the international speech communication association.
Hermansky H. Perceptual linear predictive (plp) analysis of speech. The Journal of the Acoustical Society of America 1990;87(4):1738–1752.
Article Google Scholar
Fastl H. 2005. Psycho-acoustics and sound quality. In: Communication acoustics, Springer, pp 139–162.
Thomson D J. Spectrum estimation and harmonic analysis. Proc IEEE 1982;70(9):1055–1096.
Article Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735–1780.
Article Google Scholar
Welch B L. The generalization ofstudent’s’ problem when several different population variances are involved. Biometrika 1947;34(1/2):28–35.
Article MathSciNet Google Scholar

Download references

Acknowledgments

Dr. Sriparna Saha gratefully acknowledges the Young Faculty Research Fellowship (YFRF) Award, supported by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia) for carrying out this research.

Author information

Authors and Affiliations

Indian Institute of Technology Patna, Bihta, Bihar, India
Tulika Saha, Dhawal Gupta, Sriparna Saha & Pushpak Bhattacharyya

Authors

Tulika Saha
View author publications
You can also search for this author in PubMed Google Scholar
Dhawal Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Pushpak Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tulika Saha.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, T., Gupta, D., Saha, S. et al. Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework. Cogn Comput 13, 277–289 (2021). https://doi.org/10.1007/s12559-019-09704-5

Download citation

Received: 30 August 2019
Accepted: 20 November 2019
Published: 22 January 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s12559-019-09704-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Automatic speech recognition: a survey

TextConvoNet: a convolutional neural network based architecture for text classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Automatic speech recognition: a survey

TextConvoNet: a convolutional neural network based architecture for text classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation