skip to main content
10.1145/3342775.3342790acmotherconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
research-article

Crowdsourcing a self-evolving dialog graph

Published: 22 August 2019 Publication History

Abstract

In this paper we present a crowdsourcing-based approach for collecting dialog data for a social chat dialog system, which gradually builds a dialog graph from actual user responses and crowd-sourced system answers, conditioned by a given persona and other instructions. This approach was tested during the second instalment of the Amazon Alexa Prize 2018 (AP2018), both for the data collection and to feed a simple dialog system which would use the graph to provide answers. As users interacted with the system, a graph which maintained the structure of the dialogs was built, identifying parts where more coverage was needed. In an offline evaluation, we have compared the corpus collected during the competition with other potential corpora for training chatbots, including movie subtitles, online chat forums and conversational data. The results show that the proposed methodology creates data that is more representative of actual user utterances, and leads to more coherent and engaging answers from the agent. An implementation of the proposed method is available as open-source code.

References

[1]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67, 1 (2015), 1--48.
[2]
Cynthia Breazeal, Nick DePalma, Jeff Orkin, Sonia Chernova, and Malte Jung. 2013. Crowdsourcing human-robot interaction: New methods and system evaluation in a public environment. Journal of Human-Robot Interaction 2, 1 (2013), 82--111.
[3]
Cleverbot. 2018. https://www.cleverbot.com. Last accessed 2018-08-14.
[4]
Microsoft Corporation. 2018. Luis. https://www.luis.ai. Last accessed 2018-08-14.
[5]
Elena Filatova. 2012. Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012).
[6]
Wikimedia Foundation. 2019. Wikidata. https://www.wikidata.org. Last accessed 2019-04-12.
[7]
J. J. Godfrey, E. C. Holliman, and J. McDaniel. 1992. SWITCHBOARD: telephone speech corpus for research and development. In {Proceedings} ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. 517--520 vol.1.
[8]
Ting-Hao 'Kenneth' Huang, Joseph Chee Chang, and Jeffrey P. Bigham. 2018. Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time. (2018). arXiv:1801.02668
[9]
Amazon.com Inc. 2018. The Amazon Alexa Prize. https://developer.amazon.com/alexaprize. Last accessed 2018-10-24.
[10]
Amazon.com Inc. 2018. Lex. https://aws.amazon.com/lex. Last accessed 2018-08-14.
[11]
Wit.AI Inc. 2018. Wit. https://wit.ai. Last accessed 2018-08-14.
[12]
Sina Jafarpour, Christopher JC Burges, and Alan Ritter. 2010. Filter, rank, and transfer the knowledge: Learning to chat. Advances in Ranking 10 (2010), 2329--9290.
[13]
Patrik Jonell, Mattias Bystedt, Fethiye Irmak Doğan, Per Fallgren, Jonas Ivarsson, Marketa Slukova, Ulme Wennberg, José Lopes, Johan Boye, and Gabriel Skantze. 2018. Fantom: A Crowdsourced Social Chatbot using an Evolving Dialog Graph. Alexa Prize Proceedings (2018).
[14]
John F Kelley. 1983. An empirical methodology for writing user-friendly natural language computer applications. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 193--196.
[15]
Chandra Khatri, Behnam Hedayatnia, Anu Venkatesh, Jeff Nunn, Yi Pan, Qing Liu, Han Song, Anna Gottardi, Sanjeev Kwatra, Sanju Pancholi, Ming Cheng, Qinglang Chen, Lauren Stubel, Karthik Gopalakrishnan, Kate Bland, Raefer Gabriel, Arindam Mandal, Dilek Hakkani-Tür, Gene Hwang, Nate Michel, Eric King, and Rohit Prasad. 2018. Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize. CoRR abs/1812.10757 (2018). arXiv:1812.10757 http://arxiv.org/abs/1812.10757
[16]
Ben Krause, Marco Damonte, Mihai Dobre, Daniel Duma, Joachim Fainberg, Federico Fancellu, Emmanuel Kahembwe, Jianpeng Cheng, and Bonnie L. Webber. 2017. Edina: Building an Open Domain Socialbot with Self-dialogues. CoRR abs/1709.09816 (2017). arXiv:1709.09816 http://arxiv.org/abs/1709.09816
[17]
Alexandra Kuznetsova, Per B. Brockhof, and Rune H. B. Christensen. 2017. ImerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software 82, 13 (2017), 1--26.
[18]
Iolanda Leite, André Pereira, Allison Funkhouser, Boyang Li, and Jill Fain Lehman. 2016. Semi-situated Learning of Verbal and Nonverbal Content for Repeated Human-robot Interaction. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI 2016). ACM, New York, NY, USA, 13--20.
[19]
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. arXiv (2016), 10. arXiv:1603.06155 http://arxiv.org/abs/1603.06155
[20]
Pierre Lison and Jörg Tiedemann. 2016. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (23--28), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Paris, France.
[21]
Google LLC. 2018. Dialogflow. https://dialogflow.com. Last accessed 2018-08-14.
[22]
Xiaofei Lu. 2009. Automatic measurement of syntactic complexity in child language acquisition. International Journal of Corpus Linguistics 14, 1 (2009), 3--28.
[23]
Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. 2017. The E2E dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254 (2017).
[24]
OpenSubtitles. 2018. https://www.opensubtitles.org. Last accessed 2018-08-14.
[25]
Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 385--392.

Cited By

View all
  • (2022)Unifying Recommender Systems and Conversational User InterfacesProceedings of the 4th Conference on Conversational User Interfaces10.1145/3543829.3544524(1-7)Online publication date: 26-Jul-2022
  • (2022)On the Use of Chatbots to Report Non-consensual Intimate Images Abuses: the Legal Expert PerspectiveProceedings of the 2022 ACM Conference on Information Technology for Social Good10.1145/3524458.3547247(96-102)Online publication date: 7-Sep-2022
  • (2022)Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?International Journal of Social Robotics10.1007/s12369-021-00849-814:4(1067-1085)Online publication date: 5-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces
August 2019
131 pages
ISBN:9781450371872
DOI:10.1145/3342775
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • CogSIS Project: CogSIS Project
  • ADAPT: ADAPT Centre
  • Irish Research Council: Irish Research Council

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdsourcing
  2. datasets
  3. dialog systems
  4. human-computer interaction

Qualifiers

  • Research-article

Funding Sources

  • Swedish Foundation for Strategic Research
  • Swedish Research Council

Conference

CUI 2019
Sponsor:
  • CogSIS Project
  • ADAPT
  • Irish Research Council

Acceptance Rates

CUI '19 Paper Acceptance Rate 9 of 28 submissions, 32%;
Overall Acceptance Rate 34 of 100 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Unifying Recommender Systems and Conversational User InterfacesProceedings of the 4th Conference on Conversational User Interfaces10.1145/3543829.3544524(1-7)Online publication date: 26-Jul-2022
  • (2022)On the Use of Chatbots to Report Non-consensual Intimate Images Abuses: the Legal Expert PerspectiveProceedings of the 2022 ACM Conference on Information Technology for Social Good10.1145/3524458.3547247(96-102)Online publication date: 7-Sep-2022
  • (2022)Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?International Journal of Social Robotics10.1007/s12369-021-00849-814:4(1067-1085)Online publication date: 5-Jan-2022
  • (2021)Participatory Development and Pilot Testing of an Adolescent Health Promotion ChatbotFrontiers in Public Health10.3389/fpubh.2021.7247799Online publication date: 11-Nov-2021
  • (2021)Crowdsourcing Ecologically-Valid Dialogue Data for GermanFrontiers in Computer Science10.3389/fcomp.2021.6860503Online publication date: 21-Jun-2021
  • (2021)ProtoChatProceedings of the ACM on Human-Computer Interaction10.1145/34329244:CSCW3(1-27)Online publication date: 5-Jan-2021
  • (2020)Decision Trees as Sociotechnical Objects in Chatbot DesignProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406133(1-3)Online publication date: 22-Jul-2020
  • (2020)Model-Driven Chatbot DevelopmentConceptual Modeling10.1007/978-3-030-62522-1_15(207-222)Online publication date: 29-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media