short-paper

What Do Viewers Say to Their TVs?: An Analysis of Voice Queries to Entertainment Systems

Authors:
Jinfeng Rao

University of Maryland & Comcast Applied AI Research Lab, College Park, MD, USA

University of Maryland & Comcast Applied AI Research Lab, College Park, MD, USA
View Profile

,
Ferhan Ture

Comcast Applied AI Research Lab, Washington, DC, USA

Comcast Applied AI Research Lab, Washington, DC, USA
View Profile

,
Jimmy Lin

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalJune 2018Pages 1213–1216https://doi.org/10.1145/3209978.3210140

Published:27 June 2018Publication History

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 1213–1216

ABSTRACT

A recently-introduced product of Comcast, a large cable company in the United States, is a "voice remote" that accepts spoken queries from viewers. We present an analysis of a large query log from this service to answer the question: "What do viewers say to their TVs?" In addition to a descriptive characterization of queries and sessions, we describe two complementary types of analyses to support query understanding. First, we propose a domain-specific intent taxonomy to characterize viewer behavior: as expected, most intents revolve around watching programs---both direct navigation as well as browsing---but there is a non-trivial fraction of non-viewing intents as well. Second, we propose a domain-specific tagging scheme for labeling query tokens, that when combined with intent and program prediction, provides a multi-faceted approach to understand voice queries directed at entertainment systems.

References

A. Acero, N. Bernstein, R. Chambers, Y. C. Ju, X. Li, J. Odell, P. Nguyen, O. Scholz, and G. Zweig. 2008. Live Search for Mobile: Web Services by Voice on the Cellphone. ICASSP.Google Scholar
C. Chelba and J. Schalkwyk. 2013. Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search. Mobile Speech and Advanced Natural Language Solutions.Google Scholar
J. Feng and S. Bangalore. 2009. Effects of Word Confusion Networks on Voice Search. EACL. Google ScholarDigital Library
I. Guy. 2016. Searching by Talking: Analysis of Voice Queries on Mobile Web Search. SIGIR. Google ScholarDigital Library
L. Kong, C. Alberti, D. Andor, I. Bogatyy, and D. Weiss. 2017. DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks. arXiv:1703.04474.Google Scholar
H. Liao, G. Pundak, O. Siohan, M. Carroll, N. Coccaro, Q. Jiang, T. Sainath, A. Senior, F. Beaufays, and M. Bacchiani. 2015. Large Vocabulary Automatic Speech Recognition for Children. Interspeech.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. NIPS. Google ScholarDigital Library
J. Rao, F. Ture, H. He, O. Jojic, and J. Lin. 2017. Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks. CIKM. Google ScholarDigital Library
J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Garrett, and B. Strope. 2010. "Your Word is My Command": Google Search by Voice: A Case Study. Advances in Speech Recognition.Google Scholar
J. Shan, G. Wu, Z. Hu, X. Tang, M. Jansche, and P. Moreno. 2010. Search by Voice in Mandarin Chinese. INTERSPEECH.Google Scholar
Y. Wang, D. Yu, Y. Ju, and A. Acero. 2008. An Introduction to Voice Search. IEEE Signal Processing Magazine 25, 3, 29--38.Google Scholar
J. Yi and F. Maghoul. 2011. Mobile Search Pattern Evolution: The Trend and the Impact of Voice Queries. WWW. Google ScholarDigital Library

Index Terms

What Do Viewers Say to Their TVs?: An Analysis of Voice Queries to Entertainment Systems
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query log analysis
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Speech / audio search

Recommendations

Dynamic Subtitles: The User Experience
TVX '15: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video

Subtitles (closed captions) on television are typically placed at the bottom-centre of the screen. However, placing subtitles in varying positions, according to the underlying video content (`dynamic subtitles'), has the potential to make the overall ...
Read More
Categorization of Japanese TV Viewers Based on Program Genres They Watch

Although programpreferences can be characterized on the basis of demographic attributes like sex, age or occupation or by taking the cultural studies approach focused on ethnic or social traits, preferences for programs often differ among people of the ...
Read More
Discrimination of media moments and media intervals: sticker-based watch-and-comment annotation

In this paper we discuss the problem of how to discriminate moments of interest on videos or live broadcast shows. The primary contribution is a system which allows users to personalize their programs with previously created media stickers--pieces of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
intelligent agents
keywords: voice queries
speech interfaces
tv
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '18 Paper Acceptance Rate86of409submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 213
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

What Do Viewers Say to Their TVs?: An Analysis of Voice Queries to Entertainment Systems

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic Subtitles: The User Experience

Categorization of Japanese TV Viewers Based on Program Genres They Watch

Discrimination of media moments and media intervals: sticker-based watch-and-comment annotation