Pair Me Up: A Web Framework for Crowd-Sourced Spoken Dialogue Collection

Manuvinakurike, Ramesh; DeVault, David

doi:10.1007/978-3-319-19291-8_18

Ramesh Manuvinakurike⁵ &
David DeVault⁵

1105 Accesses

Abstract

We describe and analyze a new web-based spoken dialogue data collection framework. The framework enables the capture of conversational speech from two remote users who converse with each other and play a dialogue game entirely through their web browsers. We report on the substantial improvements in the speed and cost of data capture we have observed with this crowd-sourced paradigm. We also analyze a range of data quality factors by comparing a crowd-sourced data set involving 196 remote users to a smaller but more quality controlled lab-based data set. We focus our comparison on aspects that are especially important in our spoken dialogue research, including audio quality, the effect of communication latency on the interaction, our ability to synchronize the collected data, our ability to collect examples of excellent game play, and the naturalness of the resulting interactions. This analysis illustrates some of the current trade-offs between lab-based and crowd-sourced spoken dialogue data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
When the users listen through speakers, it often happens that one of their microphones picks up the speech output of their partner, and echo ensues. We currently do not attempt to cancel this echo.
2.
[1] http://www.flickr.com/photos/jiuguangw/4981810943/, [2] http://www.flickr.com/photos/jiuguangw/4982411246/, [3] http://www.flickr.com/photos/alexhealing/2841176750/, [5] http://www.flickr.com/photos/ozzywu1974/325574892/, [6] http://www.flickr.com/photos/robocup2013/9154156312/, [7] http://www.flickr.com/photos/waagsociety/8463802099/, and [8] http://www.flickr.com/photos/jannem/1885853738/.

References

Jiang R, Banchs RE, Kim S, Yeo KH, Niswar A, Li H (2014) Web-based multimodal multi-domain spoken dialogue system. In: Proceedings of 5th international workshop on spoken dialog systems
Google Scholar
Lasecki W, Kamar E, Bohus D (2013) Conversations in the crowd: collecting data for task-oriented dialog learning. In: Human computation workshop on scaling speech and language understanding and dialog through crowdsourcing
Google Scholar
Liu S, Seneff S, Glass J (2010) A collective data generation method for speech language models. In: Spoken language technologies workshop (SLT)
Google Scholar
Meena R, Boye J, Skantze G, Gustafson J (2014) Crowdsourcing street-level geographic information using a spoken dialogue system. In: The 15th annual SIGdial meeting on discourse and dialogue (SIGDIAL)
Google Scholar
Mills D, Martin J, Burbank J, Kasch W (2010) Network time protocol version 4: protocol and algorithms specification. http://www.ietf.org/rfc/rfc5905.txt
Mitchell M, Bohus D, Kamar E (2014) Crowdsourcing language generation templates for dialogue systems. In: the 8th international natural language generation conference (INLG)
Google Scholar
Morbini F, Audhkhasi K, Sagae K, Artstein R, Can D, Georgiou P, Narayanan S, Leuski A, Traum D (2013) Which ASR should I choose for my dialogue system? In: Proceedings of the SIGDIAL 2013 conference, Metz
Google Scholar
Paetzel M, Racca DN, DeVault D (2014) A multimodal corpus of rapid dialogue games. In: Language resources and evaluation conference (LREC)
Google Scholar
Parent G, Eskenazi M (2010) Toward better crowdsourced transcription: transcription of a year of the let’s go bus information system data. In: IEEE workshop on spoken language technology
Google Scholar
Vogt C, Werner MJ, Schmidt TC (2013) Leveraging webrtc for p2p content distribution in web browsers. In: International Conference on Network Protocols (ICNP), pp 1–2
Google Scholar
Wang W, Bohus D, Kamar E, Horvitz E (2012) Crowdsourcing the acquisition of natural language corpora: methods and observations. In: Spoken language technology workshop (SLT), pp 73–78
Google Scholar
Yang Z, Li B, Zhu Y, King I, Levow G, Meng H (2010) Collection of user judgments on spoken dialog system with crowdsourcing. In: Spoken language technologies workshop (SLT)
Google Scholar

Download references

Acknowledgements

We thank Maike Paetzel. This material is based upon work supported by the National Science Foundation under Grant No. IIS-1219253. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. For the images in Figs. 18.1 and 18.2, we thank (numbering 1–8 from left-right, top-bottom): [1,2] Jiuguang Wang (CC BY-SA 2.0), [3] Alex Haeling (CC BY 2.0), [4] NASA, [5] Joe Wu (CC BY-NC-SA 2.0), [6] RoboCup2013 (CC BY-NC-SA 2.0), [7] Waag Society (CC BY 2.0), and [8] Janne Moren (CC BY-NC-SA 2.0).^{Footnote 2}

Author information

Authors and Affiliations

USC Institute for Creative Technologies, Playa Vista, Los Angeles, CA, 90094, USA
Ramesh Manuvinakurike & David DeVault

Authors

Ramesh Manuvinakurike
View author publications
You can also search for this author in PubMed Google Scholar
David DeVault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramesh Manuvinakurike .

Editor information

Editors and Affiliations

Department of Computer Science and Engin, Pohang University of Science & Tech, Namgu, Pohang, Korea (Republic of)
G.G. Lee
School of Information and Communications, Gwangju Institute of Science and Tech, Buk-gu, Gwangju, Korea (Republic of)
H.K. Kim
Microsoft Corporation, Redmond, Washington, USA
M. Jeong
Dept of Computer Science and Engineering, Sogang University, Mapo-gu, Seoul, Korea (Republic of)
J.-H. Kim

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Manuvinakurike, R., DeVault, D. (2015). Pair Me Up: A Web Framework for Crowd-Sourced Spoken Dialogue Collection. In: Lee, G., Kim, H., Jeong, M., Kim, JH. (eds) Natural Language Dialog Systems and Intelligent Assistants. Springer, Cham. https://doi.org/10.1007/978-3-319-19291-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-19291-8_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19290-1
Online ISBN: 978-3-319-19291-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics