Automatic prediction of engagement in human-human and human-machine dyadic and multiparty interaction scenarios could greatly aid in evaluation of the success of communication. A corpus of eight face-to-face dyadic casual conversations was recorded and used as the basis for an engagement study, which examined the effectiveness of several methods of engagement level recognition. A convolutional neural network based analysis was seen to be the most effective.
Cite as: Huang, Y., Gilmartin, E., Campbell, N. (2016) Conversational Engagement Recognition Using Auditory and Visual Cues. Proc. Interspeech 2016, 590-594, doi: 10.21437/Interspeech.2016-846
@inproceedings{huang16_interspeech, author={Yuyun Huang and Emer Gilmartin and Nick Campbell}, title={{Conversational Engagement Recognition Using Auditory and Visual Cues}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={590--594}, doi={10.21437/Interspeech.2016-846} }