skip to main content
10.1145/3331184.3331271acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Yelling at Your TV: An Analysis of Speech Recognition Errors and Subsequent User Behavior on Entertainment Systems

Published: 18 July 2019 Publication History

Abstract

Millions of consumers issue voice queries through television-based entertainment systems such as the Comcast X1, the Amazon Fire TV, and Roku TV. Automatic speech recognition (ASR) systems are responsible for transcribing these voice queries into text to feed downstream natural language understanding modules. However, ASR is far from perfect, often producing incorrect transcriptions and forcing users to take corrective action. To better understand their impact on sessions, this paper characterizes speech recognition errors as well as subsequent user responses. We provide both quantitative and qualitative analyses, examining the acoustic as well as lexical attributes of the utterances. This work represents, to our knowledge, the first analysis of speech recognition errors from real users on a widely-deployed entertainment system.

References

[1]
A. Black and K. Lenzo. 2001. Flite: A Small Fast Run-Time Synthesis Engine. In 4th ISCA Workshop on Speech Synthesis.
[2]
C.-C. Chiu, T. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski, and M. Bacchiani. 2018. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In ICASSP. 4774--4778.
[3]
I. Guy. 2016. Searching by Talking: Analysis of Voice Queries on Mobile Web Search. In SIGIR. 35--44.
[4]
J. Huang and E. Efthimiadis. 2009. Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs. In CIKM. 77--86.
[5]
J. Jiang, A. Awadallah, R. Jones, U. Ozertem, I. Zitouni, R. Kulkarni, and O. Khan. 2015. Automatic Online Evaluation of Intelligent Assistants. In WWW. 506--516.
[6]
J. Jiang, W. Jeng, and D. He. 2013. How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice Search. In SIGIR. 143--152.
[7]
M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger. 2017. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. In Interspeech. 498--502.
[8]
J. Rao, F. Ture, H. He, O. Jojic, and J. Lin. 2017. Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks. In CIKM. 557--566.
[9]
J. Rao, F. Ture, and J. Lin. 2018. Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform. In KDD. 636--645.
[10]
J. Rao, F. Ture, and J. Lin. 2018. What Do Viewers Say to Their TVs? An Analysis of Voice Queries to Entertainment Systems. In SIGIR. 1213--1216.
[11]
J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, and B. Strope. 2010. "Your Word is My Command": Google Search by Voice: A Case Study. In Advances in Speech Recognition. Springer, 61--90.
[12]
M. Shokouhi, R. Jones, U. Ozertem, K. Raghunathan, and F. Diaz. 2014. Mobile Query Reformulations. In SIGIR. 1011--1014.
[13]
Y.-Y. Wang, D. Yu, Y.-C. Ju, and A. Acero. 2008. An Introduction to Voice Search. IEEE Signal Processing Magazine, Vol. 25, 3 (2008).
[14]
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig. 2016. Achieving Human Parity in Conversational Speech Recognition. arXiv:1610.05256.

Cited By

View all
  • (2023)A Competition-Aware Approach to Accurate TV Show Recommendation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00216(2822-2834)Online publication date: Apr-2023
  • (2022)Temporal Early Exiting for Streaming Speech Commands RecognitionICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9746863(7567-7571)Online publication date: 23-May-2022
  • (2021)La subtitulación intralingüística en la docencia de lenguas de especialidadAlsic10.4000/alsic.5409Online publication date: 29-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2019
1512 pages
ISBN:9781450361729
DOI:10.1145/3331184
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error characterization
  2. intelligent agents
  3. voice search

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGIR '19
Sponsor:

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)3
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Competition-Aware Approach to Accurate TV Show Recommendation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00216(2822-2834)Online publication date: Apr-2023
  • (2022)Temporal Early Exiting for Streaming Speech Commands RecognitionICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9746863(7567-7571)Online publication date: 23-May-2022
  • (2021)La subtitulación intralingüística en la docencia de lenguas de especialidadAlsic10.4000/alsic.5409Online publication date: 29-Dec-2021
  • (2021)Characterizing search activities on stack overflowProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468582(919-931)Online publication date: 20-Aug-2021
  • (2020)Auto-annotation for Voice-enabled Entertainment SystemsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401241(1557-1560)Online publication date: 25-Jul-2020
  • (2020)Conceptualizing Augmented Reality Television for the Living RoomProceedings of the 2020 ACM International Conference on Interactive Media Experiences10.1145/3391614.3393660(1-12)Online publication date: 17-Jun-2020
  • (2019)Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment PlatformsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331433(1375-1376)Online publication date: 18-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media