ABSTRACT
Video captioning can increase the accessibility of information for people who are deaf or hard-of-hearing and benefit second language learners and reading-deficient students. We propose a caption editing system that harvests crowdsourced work for the useful task of video captioning. To make the task an engaging activity, its interface incorporates game-like elements. Non-expert users submit their transcriptions for short video segments against a countdown timer, either in a "type" or "fix" mode, to score points. Transcriptions from multiple users are aligned and merged to form the final captions. Preliminary results with 42 participants and 578 short video segments show that the Word Error Rate of the merged captions with two users per segment improved from 20.7% in ASR to 16%. Finally, we discuss our work in progress to improve both the accuracy of the collected data and to increase the crowd engagement.
- Amazon Mechanical Turk. http://www.mturk.com.Google Scholar
- Automatic captions for YouTube videos. http://googleblog.blogspot.com.Google Scholar
- CastingWords. https://castingwords.com.Google Scholar
- Gruenstein, A., McGraw, I., and Sutherland, A. 2009. A self-transcribing speech corpus: collecting continuous speech with an online educational game. In SLaTE Workshop.Google Scholar
- Goto, M., and Ogata, J. 2011. PodCastle: Recent Advances of a Spoken Document Retrieval Service Improved by Anonymous User Contributions. In INTERSPEECH 2011, 3073--3076.Google Scholar
- Kirkland, E., Byrom, E., MacDougall, M., and Corcoran, M. 1995. The Effectiveness of Television Captioning on Comprehension and Preference. American Educational Research Association, 1995 Annual Meeting, San Francisco, CA.Google Scholar
- Krajka, J. 2013. Audiovisual Translation in LSP--A Case for Using Captioning in Teaching Languages for Specific Purposes. Scripta Manent 8(1), 2--14.Google Scholar
- Lasecki, W. S., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. P. 2012. Real-time captioning by groups of non-experts. In Proc. UIST 2012, ACM, 23--34. Google ScholarDigital Library
- Liem, B., Zhang, H., and Chen, Y. 2011. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Human Computation.Google Scholar
- Meyer, M. J., and Lee, Y. B. B. 1995. Closed-Captioned Prompt Rates: Their Influence on Reading Outcomes. Office of Special Education and Rehabilitative Services.Google Scholar
- Naim, I., Gildea, D., Lasecki, W. S., and Bigham, J. P. 2013. Text Alignment for Real-Time Crowd Captioning. In Proc. NAACL-HLT 2013, 201--210.Google Scholar
- Sobhi, A., Nagatsuma, R., and Saitoh, T. 2012. Collaborative Caption Editing System--Enhancing the Quality of a Captioning and Editing System. In Proc. of the 28th Annual International Technology and Persons with Disabilities Conference (CSUN).Google Scholar
- Soltau, H., Saon, G., and Kingsbury, B. 2010. The IBM Attila speech recognition toolkit. In Spoken Language Technology Workshop (SLT), IEEE, 97--102.Google Scholar
- von Ahn, L., and Dabbish, L. 2008. Designing games with a purpose. Comm. ACM, 51(8), 58--67. Google ScholarDigital Library
- Wald, M. 2013. Concurrent Collaborative Captioning. In Proc. SERP 2013Google Scholar
- Winke, P., Gass, S., and Sydorenko, T. 2010. The effects of captioning videos used for foreign language listening activities. Language Learning & Technology, 14(1), 65--8.Google Scholar
Index Terms
- Introducing game elements in crowdsourced video captioning by non-experts
Recommendations
Leveraging Complementary Contributions of Different Workers for Efficient Crowdsourcing of Video Captions
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsHearing-impaired people and non-native speakers rely on captions for access to video content, yet most videos remain uncaptioned or have machine-generated captions with high error rates. In this paper, we present the design, implementation and ...
Real-time captioning by groups of non-experts
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technologyReal-time captioning provides deaf and hard of hearing people immediate access to spoken language and enables participation in dialogue with others. Low latency is critical because it allows speech to be paired with relevant visual cues. Currently, the ...
Leveraging Pauses to Improve Video Captions
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and AccessibilityCurrently, video sites that offer automatic speech recognition display the auto-generated captions as arbitrarily segmented lines of unpunctuated text. This method of displaying captions can be detrimental to meaning, especially for deaf users who rely ...
Comments