Abstract
We describe and analyze a new web-based spoken dialogue data collection framework. The framework enables the capture of conversational speech from two remote users who converse with each other and play a dialogue game entirely through their web browsers. We report on the substantial improvements in the speed and cost of data capture we have observed with this crowd-sourced paradigm. We also analyze a range of data quality factors by comparing a crowd-sourced data set involving 196 remote users to a smaller but more quality controlled lab-based data set. We focus our comparison on aspects that are especially important in our spoken dialogue research, including audio quality, the effect of communication latency on the interaction, our ability to synchronize the collected data, our ability to collect examples of excellent game play, and the naturalness of the resulting interactions. This analysis illustrates some of the current trade-offs between lab-based and crowd-sourced spoken dialogue data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
When the users listen through speakers, it often happens that one of their microphones picks up the speech output of their partner, and echo ensues. We currently do not attempt to cancel this echo.
- 2.
[1]Â http://www.flickr.com/photos/jiuguangw/4981810943/, [2]Â http://www.flickr.com/photos/jiuguangw/4982411246/, [3]Â http://www.flickr.com/photos/alexhealing/2841176750/, [5]Â http://www.flickr.com/photos/ozzywu1974/325574892/, [6]Â http://www.flickr.com/photos/robocup2013/9154156312/, [7]Â http://www.flickr.com/photos/waagsociety/8463802099/, and [8]Â http://www.flickr.com/photos/jannem/1885853738/.
References
Jiang R, Banchs RE, Kim S, Yeo KH, Niswar A, Li H (2014) Web-based multimodal multi-domain spoken dialogue system. In: Proceedings of 5th international workshop on spoken dialog systems
Lasecki W, Kamar E, Bohus D (2013) Conversations in the crowd: collecting data for task-oriented dialog learning. In: Human computation workshop on scaling speech and language understanding and dialog through crowdsourcing
Liu S, Seneff S, Glass J (2010) A collective data generation method for speech language models. In: Spoken language technologies workshop (SLT)
Meena R, Boye J, Skantze G, Gustafson J (2014) Crowdsourcing street-level geographic information using a spoken dialogue system. In: The 15th annual SIGdial meeting on discourse and dialogue (SIGDIAL)
Mills D, Martin J, Burbank J, Kasch W (2010) Network time protocol version 4: protocol and algorithms specification. http://www.ietf.org/rfc/rfc5905.txt
Mitchell M, Bohus D, Kamar E (2014) Crowdsourcing language generation templates for dialogue systems. In: the 8th international natural language generation conference (INLG)
Morbini F, Audhkhasi K, Sagae K, Artstein R, Can D, Georgiou P, Narayanan S, Leuski A, Traum D (2013) Which ASR should I choose for my dialogue system? In: Proceedings of the SIGDIAL 2013 conference, Metz
Paetzel M, Racca DN, DeVault D (2014) A multimodal corpus of rapid dialogue games. In: Language resources and evaluation conference (LREC)
Parent G, Eskenazi M (2010) Toward better crowdsourced transcription: transcription of a year of the let’s go bus information system data. In: IEEE workshop on spoken language technology
Vogt C, Werner MJ, Schmidt TC (2013) Leveraging webrtc for p2p content distribution in web browsers. In: International Conference on Network Protocols (ICNP), pp 1–2
Wang W, Bohus D, Kamar E, Horvitz E (2012) Crowdsourcing the acquisition of natural language corpora: methods and observations. In: Spoken language technology workshop (SLT), pp 73–78
Yang Z, Li B, Zhu Y, King I, Levow G, Meng H (2010) Collection of user judgments on spoken dialog system with crowdsourcing. In: Spoken language technologies workshop (SLT)
Acknowledgements
We thank Maike Paetzel. This material is based upon work supported by the National Science Foundation under Grant No. IIS-1219253. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. For the images in Figs. 18.1 and 18.2, we thank (numbering 1–8 from left-right, top-bottom): [1,2] Jiuguang Wang (CC BY-SA 2.0), [3] Alex Haeling (CC BY 2.0), [4] NASA, [5] Joe Wu (CC BY-NC-SA 2.0), [6] RoboCup2013 (CC BY-NC-SA 2.0), [7] Waag Society (CC BY 2.0), and [8] Janne Moren (CC BY-NC-SA 2.0).Footnote 2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Manuvinakurike, R., DeVault, D. (2015). Pair Me Up: A Web Framework for Crowd-Sourced Spoken Dialogue Collection. In: Lee, G., Kim, H., Jeong, M., Kim, JH. (eds) Natural Language Dialog Systems and Intelligent Assistants. Springer, Cham. https://doi.org/10.1007/978-3-319-19291-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-19291-8_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19290-1
Online ISBN: 978-3-319-19291-8
eBook Packages: Computer ScienceComputer Science (R0)