research-article

Detecting head movements in video-recorded dyadic conversations

Authors:

Patrizia Paggio,

Manex Agirrezabal,

Costanza NavarrettaAuthors Info & Claims

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct

Article No.: 1, Pages 1 - 6

https://doi.org/10.1145/3281151.3281152

Published: 16 October 2018 Publication History

Abstract

This paper is about the automatic recognition of head movements in videos of face-to-face dyadic conversations. We present an approach where recognition of head movements is casted as a multimodal frame classification problem based on visual and acoustic features. The visual features include velocity, acceleration, and jerk values associated with head movements, while the acoustic ones are pitch and intensity measurements from the co-occuring speech. We present the results obtained by training and testing a number of classifiers on manually annotated data from two conversations. The best performing classifier, a Multilayer Perceptron trained using all the features, obtains 0.75 accuracy and outperforms the mono-modal baseline classifier.

References

[1]

Jens Allwood. 1988. The Structure of Dialog. In Structure of Multimodal Dialog II, Martin M. Taylor, Francoise Neél, and Don G. Bouwhuis (Eds.). John Benjamins, Amsterdam, 3--24.

[2]

Jens Allwood, Loredana Cerrato, Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. 2007. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Multimodal Corpora for Modelling Human Multimodal Behaviour, Jean-Claude Martin, Patrizia Paggio, Peter Kuehnlein, Rainer Stiefelhagen, and Fabio Pianesi (Eds.). Special issue of the International Journal of Language Resources and Evaluation, Vol. 41. Springer, 273--287.

[3]

Paul Boersma and David Weenink. 2009. Praat: doing phonetics by computer (Version 5.1.05) {Computer program}. (2009). Retrieved May 1, 2009, from http://www.praat.org/.

[4]

G. Bradski and A. Koehler. 2008. Learning OpenCV: Computer Vision with the OpenCV Linbrary. O'Reilly.

[5]

Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Philadelphia, 1--8.

Digital Library

[6]

Marion Dohen, Hélène L&oelig;venbruck, and Hill Harold. 2006. Visual correlates of prosodic contrastive focus in French: description and inter-speaker variability. In Speech Prosody 2006. p-221.

[7]

Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23 (1972), 283--292.

[8]

Sebastian Germesin and Theresa Wilson. 2009. Agreement detection in multiparty conversation. In Proceedings of ICMI-MLMI 2009. 7--14.

Digital Library

[9]

Björn Granström and David House. 2005. Audiovisual representation of prosody in expressive speech communication. Speech Communication 46, 3 (July 2005), 473--484.

[10]

U. Hadar, T.J. Steiner, E.C. Grant, and F. Clifford Rose. 1983. Head Movement Correlates of Juncture and Stress at Sentence Level. Language and Speech 26, 2 (April 1983), 117--129.

[11]

D. Heylen, E. Bevacqua, M. Tellier, and C. Pelachaud. 2007. Searching for prototypical facial feedback signals. In Proceedings of 7th International Conference on Intelligent Virtual Agents. 147--153.

Digital Library

[12]

Bart Jongejan. 2012. Automatic annotation of head velocity and acceleration in Anvil. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Distribution Agency, 201--208.

[13]

Bart Jongejan, Patrizia Paggio, and Costanza Navarretta. 2017. Classifying head movements in video-recorded conversations based on movement velocity, acceleration and jerk. In Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Copenhagen, 29--30 September 2016. LinkÃűping University Electronic Press, LinkÃűpings universitet, 10--17.

[14]

Ashish Kapoor and Rosalind W. Picard. 2001. A Real-time Head Nod and Shake Detector. In Proceedings of the 2001 Workshop on Perceptive User Interfaces (PUI '01). ACM, New York, NY, USA, 1--5.

Digital Library

[15]

Adam Kendon. 2004. Gesture. Cambridge University Press.

[16]

Michael Kipp. 2004. Gesture Generation by Imitation - From Human Behavior to Computer Character Animation. Boca Raton, Florida: Dissertation.com.

[17]

John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).

[18]

Evelyn McClave. 2000. Linguistic functions of head movements in the context of speech. Journal of Pragmatics 32 (2000), 855--878.

[19]

Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In 2007 IEEE conference on computer vision and pattern recognition. IEEE, 1--8.

[20]

L.-P. Morency, C. Sidner, C. Lee, and T. Darrell. 2005. Contextual recognition of head gestures. In Proc. Int. Conf. on Multimodal Interfaces (ICMI).

Digital Library

[21]

Patrizia Paggio, Jens Allwood, Elisabeth Ahlsén, Kristiina Jokinen, and Costanza Navarretta. 2010. The NOMCO Multimodal Nordic Resource - Goals and Characteristics. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (19--21). European Language Resources Association (ELRA), Valletta, Malta.

[22]

P. Paggio and C. Navarretta. 2011. Head Movements, Facial Expressions and Feedback in Danish First Encounters Interactions: A Culture-Specific Analysis. In Universal Access in Human-Computer Interaction - Users Diversity. 6th International Conference. UAHCI 2011, Held as Part of HCI International 2011 (LNCS), Constantine Stephanidis (Ed.). Springer Verlag, Orlando Florida, 583--690.

Digital Library

[23]

Patrizia Paggio and Costanza Navarretta. 2016. The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Language Resources and Evaluation (2016), 1--32.

Digital Library

[24]

W. Tan and G. Rong. 2003. A real-time head nod and shake detector using HMMs. Expert Systems with Applications 25, 3 (2003), 461--466.

[25]

Nina Thorsen. 1980. Neutral stress, emphatic stress, and sentence Intonation in Advanced Standard Copenhagen Danish. Technical Report 14. University of Copenhagen. 121--205 pages. https://danpass.hum.ku.dk/ng/papers/aripuc14_1980_121-205.pdf

[26]

Haolin Wei, Patricia Scanlon, Yingbo Li, David S Monaghan, and Noel E O'Connor. 2013. Real-time head nod and shake detection for continuous human affect recognition. In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS). IEEE, 1--4.

[27]

Victor Yngve. 1970. On getting a word in edgewise. In Papers from the sixth regional meeting of the Chicago Linguistic Society. 567--578.

[28]

Z. Zhao, Y. Wang, and S. Fu. 2012. Head Movement Recognition Based on the Lucas-Kanade Algorithm. In Computer Science Service System (CSSS), 2012 International Conference on. 2303--2306.

Digital Library

Cited By

Bauer AKuder ASchulder MSchepens J(2024)Phonetic differences between affirmative and feedback head nods in German Sign Language (DGS): A pose estimation studyPLOS ONE10.1371/journal.pone.030404019:5(e0304040)Online publication date: 30-May-2024
https://doi.org/10.1371/journal.pone.0304040
Henlein ABauer ABhattacharjee RĆwiek AGregori AKügler FLemanski JLücking AMehler APrieto PSánchez-Ramón PSchepens JSchulte-Rüther MSchweinberger Svon Eiff C(2024)An Outlook for AI Innovation in Multimodal Communication ResearchDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-61066-0_13(182-234)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-61066-0_13

Recommendations

Deep Transfer Learning for Recognizing Functional Interactions via Head Movements in Multiparty Conversations
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Head movements play various functions in multiparty conversations. To date, convolutional neural networks (CNNs) have been proposed to recognize the functions of individual interlocutors’ head movements. This paper extends the concept of head-movement ...
Classifying Head Movements to Separate Head-Gaze and Head Gestures as Distinct Modes of Input
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Head movement is widely used as a uniform type of input for human-computer interaction. However, there are fundamental differences between head movements coupled with gaze in support of our visual system, and head movements performed as gestural ...
Detecting Autism from Head Movements using Kinesics
ICMI '24: Proceedings of the 26th International Conference on Multimodal Interaction

Head movements play a crucial role in social interactions. The quantification of communicative movements such as nodding, shaking, orienting, and backchanneling is significant in behavioral and mental health research. However, automated localization of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct

October 2018

62 pages

ISBN:9781450360029

DOI:10.1145/3281151

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '18

Sponsor:

SIGCHI

ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 16 - 20, 2018

Colorado, Boulder

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bauer AKuder ASchulder MSchepens J(2024)Phonetic differences between affirmative and feedback head nods in German Sign Language (DGS): A pose estimation studyPLOS ONE10.1371/journal.pone.030404019:5(e0304040)Online publication date: 30-May-2024
https://doi.org/10.1371/journal.pone.0304040
Henlein ABauer ABhattacharjee RĆwiek AGregori AKügler FLemanski JLücking AMehler APrieto PSánchez-Ramón PSchepens JSchulte-Rüther MSchweinberger Svon Eiff C(2024)An Outlook for AI Innovation in Multimodal Communication ResearchDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-61066-0_13(182-234)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-61066-0_13

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents