ABSTRACT
Many people find it difficult to communicate in a foreign language. In order to help these people, one approach being studied is the use of captions generated by automatic speech recognition (ASR). Captions are known to facilitate comprehension of foreign languages, but ASR-generated captions may be subject to problems attributable to recognition errors and recognition time.
We conducted two experiments using subjects who are native Japanese speakers to determine how these differences caused by ASR affect understanding when listening to English. We found that captions with 80% accuracy will increase the understanding of the subjects with intermediate English skills, which would apply to about half of native Japanese users. Additionally, changing the display timing of the caption from after speech to before speech would contribute to improving the understanding more than increasing accuracy from 80% to 100%.
These findings suggest that captions generated with today's ASR can help non-native speakers communicate in English when used carefully
- Takezawa T., Morimoto T., Sagisaka Y., Cambell N. and Iida H. (1998). A Japanese-to-English speech translation system: ATR-MATRIX, Proc. ICSLP 1998, 2779--2782.Google Scholar
- Garza, T. (1991). Evaluating the Use of Captioned Video Materials in Advanced Foreign Language Learning, Foreign Language Annals, Vol.24, No.3, 239--258.Google Scholar
- Huang, H. and Eskey, D, (2000). The effects of closed-captioned television on the listening comprehension of intermediate English as a second language (ESL) students, J. Educational Technology Systems, Vol.28, No.1, 75--96.Google ScholarCross Ref
- Munteanu C., Baecker R., Penn G., Toms G. and James D. (2006). The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. Proc. ACM CHI, 493--502. Google ScholarDigital Library
- Kanazawa A., and Isono H. (2001) Cognitive Experiments on Timing Differences for News Subtitling and a Compensation Method. Proc. ITE Annual Convention 2001, 89--90.Google Scholar
- Maruyama I., Abe Y., Sawamura E., Mitsuhashi T., Ehara T. and Shirai K.(1999). Cognitive Experiments on Timing Differences for Superimposing Closed Captions in News Programs, Technical report of IEICE. HCS, Vol. 99, No. 123, 21--28.Google Scholar
- Educational Testing Service: TOEIC PROFCIENCY SCALE. (online) http://www.toeic.or.jp/toeic/pdf/data/proficiency.pd.Google Scholar
- McCowan I., Moore D., Dines J., Gatica-Perez D., Flynn M., Wellner P. and Bourlard H. (2005) On the Use of Information Retrieval Measures for Speech Recognition Evaluation. Research Report 04-73, IDIAP Research Institute..Google Scholar
Index Terms
- Automatically generated captions: will they help non-native speakers communicate in english?
Recommendations
Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing
ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and AccessibilityThe accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have ...
Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and AccessibilityDeaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study ...
Preferred Appearance of Captions Generated by Automatic Speech Recognition for Deaf and Hard-of-Hearing Viewers
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing SystemsAs the accuracy of Automatic Speech Recognition (ASR) nears human-level quality, it might become feasible as an accessibility tool for people who are Deaf and Hard of Hearing (DHH) to transcribe spoken language to text. We conducted a study using in-...
Comments