Multimodal Dialogue Data Collection and Analysis of Annotation Disagreement

Komatani, Kazunori; Okada, Shogo; Nishimoto, Haruto; Araki, Masahiro; Nakano, Mikio

doi:10.1007/978-981-15-9323-9_17

Kazunori Komatani³⁹,
Shogo Okada^40,41,
Haruto Nishimoto³⁹,
Masahiro Araki⁴² &
…
Mikio Nakano⁴³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 714))

615 Accesses
3 Altmetric

Abstract

We have been collecting multimodal dialogue data [1] to contribute to the development of multimodal dialogue systems that can take a user’s non-verbal behaviors into consideration. We recruited 30 participants from the general public whose ages ranged from 20 to 50 and genders were almost balanced. The consent form to be filled in by the participants was updated to enable data distribution to researchers as long as it is used for research purposes. After the data collection, eight annotators were divided into three groups and assigned labels representing how much a participant looks interested in the current topic to every exchange. The labels given among the annotators do not always agree as they depend on subjective impressions. We also analyzed the disagreement among annotators and temporal changes of impressions of the same annotators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analyzing Likert Scale Inter-annotator Disagreement

The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations

Article 19 October 2016

The ALICO corpus: analysing the active listener

Article 21 May 2016

Notes

1.
These activities are being conducted by a working group (Human-System Multimodal Dialogue Sharing Corpus Building Group) under SIG-SLUD of the Japanese Society for Artificial Intelligence (JSAI).
2.
https://developer.amazon.com/alexaprize.
3.
http://www.mmdagent.jp/.
4.
Two annotators gave labels to the data of all three groups.

References

Araki M, Tomimasu S, Nakano M, Komatani K, Okada S, Fujie S, Sugiyama H (2018) Collection of multimodal dialog data and analysis of the result of annotation of users’ interest level. In: Proceedings of international conference on language resources and evaluation (LREC)
Google Scholar
Carletta J (2007) Unleashing the killer corpus: experiences in creating the multi-everything AMI meeting corpus. Lang Resour Eval 41(2):181–190
Article Google Scholar
Chen L, Rose RT, Qiao Y, Kimbara I, Parrill F, Welji H, Han TX, Tu J, Huang Z, Harper M, Quek F, Xiong Y, McNeill D, Tuttle R, Huang T (2006) VACE multimodal meeting corpus. In: Proceedings of the 2nd international conference on machine learning for multimodal interaction (MLMI05), pp 40–51. https://doi.org/10.1007/11677482_4
Chiba Y, Ito M, Nose T, Ito A (2014) User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor’s internal state. In: Proceedings of annual meeting of the special interest group on discourse and dialogue (SIGDIAL), pp 74–78. http://www.aclweb.org/anthology/W14-4310
Chollet M, Prendinger H, Scherer S (2016) Native vs. non-native language fluency implications on multimodal interaction for interpersonal skills training. In: Proceedings of international conference on multimodal interaction (ICMI), pp 386–393. http://doi.acm.org/10.1145/2993148.2993196
Dhall A, Goecke R, Ghosh S, Joshi J, Hoey J, Gedeon T (2017) From individual to group-level emotion recognition: EmotiW 5.0. In: Proceedings of international conference on multimodal interaction (ICMI). ACM, New York, NY, USA, pp 524–528. http://doi.acm.org/10.1145/3136755.3143004
Higashinaka R, Funakoshi K, Araki M, Tsukahara H, Kobayashi Y, Mizukami M (2015) Towards taxonomy of errors in chat-oriented dialogue systems. In: Proceedings of annual meeting of the special interest group on discourse and dialogue (SIGDIAL), pp 87–95
Google Scholar
Hirayama T, Sumi Y, Kawahara T, Matsuyama T (2011) Info-concierge: proactive multi-modal interaction through mind probing. In: The Asia Pacific signal and information processing association annual summit and conference (APSIPA ASC 2011)
Google Scholar
Inoue K, Lala D, Takanashi K, Kawahara T (2018) Latent character model for engagement recognition based on multimodal behaviors. In: Proceedings of international workshop on spoken dialogue systems (IWSDS)
Google Scholar
Janin A, Baron D, Edwards J, Ellis D, Gelbart D, Morgan N, Peskin B, Pfau T, Shriberg E, Stolcke A, Wooters C (2003) The ICSI meeting corpus. In: Proceedings of IEEE international conference on acoustics, speech & signal processing (ICASSP), pp I–364–I–367. https://doi.org/10.1109/ICASSP.2003.1198793
Kumano S, Otsuka K, Matsuda M, Ishii R, Yamato J (2013) Using a probabilistic topic model to link observers’ perception tendency to personality. In: Proceedings of the ACM conference on affective computing and intelligent interaction (ACII), pp 588–593. https://doi.org/10.1109/ACII.2013.103
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Google Scholar
Nakano YI, Ishii R (2010) Estimating user’s engagement from eye-gaze behaviors in human-agent conversations. In: Proceedings of international conference on intelligent user interfaces (IUI), pp 139–148. https://doi.org/10.1145/1719970.1719990
Ozkan D, Morency LP (2011) Modeling wisdom of crowds using latent mixture of discriminative experts. In: Proceedings of annual meeting of the association for computational linguistics (ACL): human language technologies (HLT), pp 335–340. http://dl.acm.org/citation.cfm?id=2002736.2002806
Ozkan D, Sagae K, Morency L (2010) Latent mixture of discriminative experts for multimodal prediction modeling. In: Proceedings of international conference on computational linguistics (COLING), pp 860–868. http://aclweb.org/anthology/C10-1097
Shibasaki Y, Funakoshi K, Shinoda K (2017) Boredom recognition based on users’ spontaneous behaviors in multiparty human-robot interactions. In: Proceedings of multimedia modeling, pp 677–689. https://doi.org/10.1007/978-3-319-51811-4_55
Sidner C, Kidd C, Lee C, Lesh N (2004) Where to look: a study of human-robot engagement. In: Proceedings of international conference on intelligent user interfaces (IUI), pp 78–84. https://doi.org/10.1145/964442.964458
Stratou G, Morency LP (2017) Multisense—context-aware nonverbal behavior analysis framework: a psychological distress use case. IEEE Trans Affect Comput 8(2):190–203. https://doi.org/10.1109/TAFFC.2016.2614300
Article Google Scholar
Tomimasu S, Araki M (2016) Assessment of users’ interests in multimodal dialog based on exchange unit. In: Proceedings of the workshop on multimodal analyses enabling artificial agents in human-machine interaction (MA3HMI’16). ACM, New York, NY, USA, pp 33–37. http://doi.acm.org/10.1145/3011263.3011269
Vinciarelli A, Dielmann A, Favre S, Salamin H (2009) Canal9: a database of political debates for analysis of social interactions. In: 2009 3rd international conference on affective computing and intelligent interaction and workshops, pp 1–4. https://doi.org/10.1109/ACII.2009.5349466
Waibel A, Stiefelhagen R (2009) Computers in the human interaction loop, 1st edn. Springer Publishing Company, Incorporated, Berlin
Google Scholar

Download references

Acknowledgements

We thank the working group members who contributed to the annotation. Ms. Sayaka Tomimasu contributed to this project during the data collection. This work was partly supported by the Research Program of “Dynamic Alliance for Open Innovation Bridging Human, Environment and Materials” in Network Joint Research Center for Materials and Devices.

Author information

Authors and Affiliations

Osaka University, Osaka, Japan
Kazunori Komatani & Haruto Nishimoto
Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Shogo Okada
RIKEN AIP, Tokyo, Japan
Shogo Okada
Kyoto Institute of Technology, Kyoto, Japan
Masahiro Araki
Honda Research Institute Japan Co., Ltd, Saitama, Japan
Mikio Nakano

Authors

Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Shogo Okada
View author publications
You can also search for this author in PubMed Google Scholar
Haruto Nishimoto
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Araki
View author publications
You can also search for this author in PubMed Google Scholar
Mikio Nakano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazunori Komatani .

Editor information

Editors and Affiliations

Apple, Cupertino, CA, USA
Erik Marchi
Kore University of Enna, Enna, Italy
Sabato Marco Siniscalchi
Polytechnic University of Turin, Torino, Italy
Sandro Cumani
Kore University of Enna, Enna, Italy
Valerio Mario Salerno
National University of Singapore, Singapore, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Komatani, K., Okada, S., Nishimoto, H., Araki, M., Nakano, M. (2021). Multimodal Dialogue Data Collection and Analysis of Annotation Disagreement. In: Marchi, E., Siniscalchi, S.M., Cumani, S., Salerno, V.M., Li, H. (eds) Increasing Naturalness and Flexibility in Spoken Dialogue Interaction. Lecture Notes in Electrical Engineering, vol 714. Springer, Singapore. https://doi.org/10.1007/978-981-15-9323-9_17

Download citation

DOI: https://doi.org/10.1007/978-981-15-9323-9_17
Published: 11 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9322-2
Online ISBN: 978-981-15-9323-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics