The JESTKOD database: an affective multimodal database of dyadic interactions

Bozkurt, Elif; Khaki, Hossein; Keçeci, Sinan; Türker, B. Berker; Yemez, Yücel; Erzin, Engin

doi:10.1007/s10579-016-9377-0

The JESTKOD database: an affective multimodal database of dyadic interactions

Original Paper
Published: 28 November 2016

Volume 51, pages 857–872, (2017)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Elif Bozkurt ORCID: orcid.org/0000-0002-8293-4063¹,
Hossein Khaki¹,
Sinan Keçeci¹,
B. Berker Türker¹,
Yücel Yemez¹ &
…
Engin Erzin¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In human-to-human communication, gesture and speech co-exist in time with a tight synchrony, and gestures are often utilized to complement or to emphasize speech. In human–computer interaction systems, natural, affective and believable use of gestures would be a valuable key component in adopting and emphasizing human-centered aspects. However, natural and affective multimodal data, for studying computational models of gesture and speech, is limited. In this study, we introduce the JESTKOD database, which consists of speech and full-body motion capture data recordings in dyadic interaction setting under agreement and disagreement scenarios. Participants of the dyadic interactions are native Turkish speakers and recordings of each participant are rated in dimensional affect space. We present our multimodal data collection and annotation process, as well as our preliminary experimental studies on agreement/disagreement classification of dyadic interactions using body gesture and speech data. The JESTKOD database provides a valuable asset to investigate gesture and speech towards designing more natural and affective human–computer interaction systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

A New Multi-modal Dataset for Human Affect Analysis

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Article Open access 08 September 2020

Notes

Flex 13 system—http://www.optitrack.com/products/flex-13/.
Motive—optical motion capture software http://www.optitrack.com/products/motive/.
The JESTKOD project is supported by TÜBİTAK under Grant Number 113E102.
The JESTKOD database—http://mvgl.ku.edu.tr/databases/.

References

Bavelas, J. B., Chovil, N., Coates, L., & Roe, L. (1995). Gestures specialized for dialogue. Personality and Social Psychology Bulletin, 21(4), 394–405.
Article Google Scholar
Bousmalis, K., Mehu, M., & Pantic, M. (2009) Spotting agreement and disagreement: A survey of nonverbal audiovisual cues and tools. In 3rd International conference on affective computing and intelligent interaction and workshops (pp. 1–9).
Bousmalis, K., Morency, L., & Pantic, M. (2011). Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In IEEE international conference on automatic face gesture recognition and workshops (FG 2011) (pp. 746–752).
Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335–359.
Article Google Scholar
Busso, C., Parthasarathy, S., Burmania, A., Abdel-Wahab, M., Sadoughi, N., & Provost, E. M. (2016). MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 10(99), 1–1.
Google Scholar
Carletta, J. (2007). Unleashing the killer corpus: Experiences in creating the multi-everything ami meeting corpus. Language Resources and Evaluation, 41(2), 181–190.
Article Google Scholar
Cowie, R., Cox, C., Martin, J. C., Batliner, A., Heylen, D., & Karpouzis, K. (2011). Issues in data labelling. In P. Petta, C. Pelachaud, & R. Cowie (Eds.), Emotion-oriented systems: The Humaine handbook (pp. 215–244). Berlin: Springer.
Chapter Google Scholar
Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., et al. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In A. Paiva, R. Prada, & R. Picard (Eds.), Affective computing and intelligent interaction. Lecture notes in computer science (Vol. 4738, pp. 488–500). Berlin: Springer.
Chapter Google Scholar
Ekman, P., & Friesen, W. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Spectrum books. Englewood Cliffs: Prentice-Hall.
Google Scholar
Galley, M., McKeown, K., Hirschberg, J., & Shriberg, E. (2004) Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies. In Proceedings of the 42nd annual meeting on association for computational linguistics (p. 669). Association for Computational Linguistics, Stroudsburg, PA, USA.
Grandjean, D., Sander, D., & Scherer, K. R. (2008). Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization. Consciousness and Cognition, 17(2), 484–495.
Article Google Scholar
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The vera am mittag German audio–visual emotional speech database. In IEEE international conference on multimedia and expo (pp. 865–868).
Gunes, H., Schuller, B., Pantic, M., & Cowie. R. (2011). Emotion representation, analysis and synthesis in continuous space: A survey. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 827–834). IEEE.
Heloir, A., Neff, M., & Kipp, M. (2010). Exploiting motion capture for virtual human animation: Data collection and annotation visualization. In Proceedings of the workshop on multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 59–62).
Hillard, D., Ostendorf, M., & Shriberg, E. (2003). Detection of agreement vs. disagreement in meetings: Training with unlabeled data. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology: Companion volume of the proceedings of HLT-NAACL 2003: Short Papers (Vol. 2, pp. 34–36). Association for Computational Linguistics, Stroudsburg, PA, USA.
Khaki, H., & Erzin, E. (2016). Use of agreement/disagreement classification in dyadic interactions for continuous emotion recognition. In Proceedings of Interspeech, San Francisco, USA.
Kim, S., Valente, F., & Vinciarelli, A. (2012). Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5089–5092).
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., & Zhang, D. (2005). Automatic analysis of multimodal group actions in meetings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 305–317.
Article Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.
Article Google Scholar
Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.
Article Google Scholar
Mehu, M., & Maaten, L. (2014). Multimodal integration of dynamic audio–visual cues in the communication of agreement and disagreement. Journal of Nonverbal Behavior, 38(4), 569–597.
Article Google Scholar
Metallinou, A., Lee, C. C., Busso, C., Carnicke, S., & Narayanan, S. S. (2010). The USC CreativeIT database : A multimodal database of theatrical improvisation. In Multimodal corpora: Advances in capturing, coding and analyzing multimodality (MMC).
Metallinou, A., Yang, Z., Lee, C. C., Busso, C., Carnicke, S., & Narayanan, S. S. (2015). The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations. Language Resources and Evaluation, 50(3), 497–521.
Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2), 137–152.
Article Google Scholar
Poggi, I., D’Errico, F., & Vincze, L. (2010). Agreement and its multimodal communication in debates: A qualitative analysis. Cognitive Computation, 3(3), 466–479.
Article Google Scholar
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
Article Google Scholar
Scherer, K. R., Schorr, A., & Johnstone, T. (2001). Appraisal processes in emotion: Theory, methods, research. Oxford: Oxford University Press.
Google Scholar
Vinciarelli, A., Dielmann, A., Favre, S., & Salamin, H. (2009). Canal9: A database of political debates for analysis of social interactions. In Proceedings of the international conference on affective computing and intelligent interaction, ACII ’09 (pp. 1–4).
Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.
Article Google Scholar
Wang, W., Yaman, S., Precoda, K., & Richey, C. (2011). Automatic identification of speaker role and agreement/disagreement in broadcast conversation. In IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 5556–5559).
Yang, Z., & Narayanan, S. S. (2014). Analysis of emotional effect on speech-body gesture interplay. In Proceedings of Interspeech (pp. 1934–1938). Singapore.
Yang, Z., & Narayanan, S. S. (2016). Modeling dynamics of expressive body gestures in dyadic interactions. IEEE Transactions on Affective Computing. doi:10.1109/TAFFC.2016.2542812.
Yang, Z., Metallinou, A., Erzin, E., & Narayanan, S. S. (2014). Analysis of interaction attitudes using data-driven hand gesture phrases. In Proceedings of IEEE international conference on audio, speech and signal processing (ICASSP) (pp. 699–703). Florence, Italy.

Download references

Acknowledgements

This work is supported by TÜBİTAK under Grant Number 113E102.

Author information

Authors and Affiliations

Multimedia, Vision and Graphics Laboratory, College of Engineering, Koç University, Istanbul, Turkey
Elif Bozkurt, Hossein Khaki, Sinan Keçeci, B. Berker Türker, Yücel Yemez & Engin Erzin

Authors

Elif Bozkurt
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Khaki
View author publications
You can also search for this author in PubMed Google Scholar
Sinan Keçeci
View author publications
You can also search for this author in PubMed Google Scholar
B. Berker Türker
View author publications
You can also search for this author in PubMed Google Scholar
Yücel Yemez
View author publications
You can also search for this author in PubMed Google Scholar
Engin Erzin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Engin Erzin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bozkurt, E., Khaki, H., Keçeci, S. et al. The JESTKOD database: an affective multimodal database of dyadic interactions. Lang Resources & Evaluation 51, 857–872 (2017). https://doi.org/10.1007/s10579-016-9377-0

Download citation

Published: 28 November 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10579-016-9377-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The JESTKOD database: an affective multimodal database of dyadic interactions

Abstract

Access this article

Similar content being viewed by others

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

A New Multi-modal Dataset for Human Affect Analysis

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The JESTKOD database: an affective multimodal database of dyadic interactions

Abstract

Access this article

Similar content being viewed by others

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

A New Multi-modal Dataset for Human Affect Analysis

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation