research-article

Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks

Authors:

Jimmy LinAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 557 - 566

https://doi.org/10.1145/3132847.3132893

Published: 06 November 2017 Publication History

Abstract

We tackle the novel problem of navigational voice queries posed against an entertainment system, where viewers interact with a voice-enabled remote controller to specify the TV program to watch. This is a difficult problem for several reasons: such queries are short, even shorter than comparable voice queries in other domains, which offers fewer opportunities for deciphering user intent. Furthermore, ambiguity is exacerbated by underlying speech recognition errors. We address these challenges by integrating word- and character-level query representations and by modeling voice search sessions to capture the contextual dependencies in query sequences. Both are accomplished with a probabilistic framework in which recurrent and feedforward neural network modules are organized in a hierarchical manner. From a raw dataset of 32M voice queries from 2.5M viewers on the Comcast Xfinity X1 entertainment system, we extracted data to train and test our models. We demonstrate the benefits of our hybrid representation and context-aware model, which significantly outperforms competitive baselines that use learning to rank as well as neural networks.

References

[1]

Alex Acero, Neal Bernstein, Rob Chambers, Yun-Cheng Ju, Xinggang Li, Julian Odell, Patrick Nguyen, Oliver Scholz, and Geoffrey Zweig. 2008. Live Search for Mobile: Web Services by Voice on the Cellphone ICASSP.

[2]

Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the Impact of Short- and Long-term Behavior on Search Personalization SIGIR.

Digital Library

[3]

Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, and Qiang Yang. 2009. Context-Aware Query Classification. In SIGIR.

Digital Library

[4]

Ciprian Chelba and Johan Schalkwyk. 2013. Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search. Mobile Speech and Advanced Natural Language Solutions.

[5]

Fabio Crestani and Heather Du. 2006. Written Versus Spoken Queries: A Qualitative and Quantitative Comparative Analysis. JASIST Vol. 57, 7 (2006), 881--890.

Digital Library

[6]

Junlan Feng and Srinivas Bangalore. 2009. Effects of Word Confusion Networks on Voice Search EACL.

Digital Library

[7]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

Digital Library

[8]

Dongyi Guan, Sicong Zhang, and Hui Yang. 2013. Utilizing Query Change for Session Search. In SIGIR.

Digital Library

[9]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval CIKM.

Digital Library

[10]

Ido Guy. 2016. Searching by Talking: Analysis of Voice Queries on Mobile Web Search SIGIR. 35--44.

Digital Library

[11]

Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, and Rosie Jones. 2015. Characterizing and Predicting Voice Query Reformulation CIKM.

Digital Library

[12]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[13]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data CIKM.

Digital Library

[14]

Jiepu Jiang, Wei Jeng, and Daqing He. 2013. How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice Search. In SIGIR.

Digital Library

[15]

Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In SIGKDD.

Digital Library

[16]

Rosie Jones and Kristina Lisa Klinkner. 2008. Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM.

Digital Library

[17]

Evangelos Kanoulas, Ben Carterette, Mark Hall, Paul Clough, and Mark Sanderson. 2011. Overview of the TREC 2011 Session Track. In TREC.

[18]

Jingjing Liu and Nicholas J. Belkin. 2010. Personalizing Information Retrieval for Multi-session Tasks: The Roles of Task Stage and Task Type. In SIGIR.

Digital Library

[19]

Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations HLT/NAACL.

[20]

Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. arXiv:1705.01509v1.

[21]

Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match Using Local and Distributed Representations of Text for Web Search WWW.

Digital Library

[22]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In EMNLP.

[23]

Sundar Pichai. 2016. Google I/O Keynote.

[24]

Jinfeng Rao, Hua He, and Jimmy Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks CIKM.

Digital Library

[25]

Johan Schalkwyk, Doug Beeferman, Franccoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Kamvar, and Brian Strope. 2010. “Your Word is My Command”: Google Search by Voice: A Case Study. Advances in Speech Recognition.

[26]

Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, and Pedro J. Moreno. 2010. Search by Voice in Mandarin Chinese. In INTERSPEECH.

[27]

Milad Shokouhi, Rosie Jones, Umut Ozertem, Karthik Raghunathan, and Fernando Diaz. 2014. Mobile Query Reformulations. In SIGIR.

Digital Library

[28]

Milad Shokouhi, Umut Ozertem, and Nick Craswell. 2016. Did You Say U2 or YouTube? Inferring Implicit Transcripts from Voice Search Logs WWW.

Digital Library

[29]

Mark D. Smucker, James Allan, and Ben Carterette. 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM.

Digital Library

[30]

Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. 2015. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion CIKM.

Digital Library

[31]

Greg Sterling. 2015. It's Official: Google Says More Searches Now On Mobile Than On Desktop. http://searchengineland.com/its-official-google-says-more-searches-now-on-mobile-than-on-desktop-220369. (2015). Accessed: 2017-08--16.

[32]

Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-RMSProp, Coursera: Neural Networks for Machine Learning. (2012).

[33]

Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017. IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models SIGIR.

Digital Library

[34]

Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, and Alex Acero. 2008. An Introduction to Voice Search. IEEE Signal Processing Magazine Vol. 25, 3 (2008), 29--38.

[35]

Jeonghe Yi and Farzin Maghoul. 2011. Mobile Search Pattern Evolution: The Trend and the Impact of Voice Queries WWW.

Digital Library

Cited By

Tedjopurnomo DBao ZZheng BChoudhury FQin A(2020)A Survey on Modern Deep Neural Network for Traffic Prediction: Trends, Methods and ChallengesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3001195(1-1)Online publication date: 2020
https://doi.org/10.1109/TKDE.2020.3001195
Ture FRao JTang RLin JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment PlatformsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331433(1375-1376)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331433
Tang RTure FLin JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Yelling at Your TVProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331271(853-856)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331271
Show More Cited By

Index Terms

Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks
1. Information systems
  1. Information retrieval

Recommendations

ShapeShifting TV: interactive screen media narratives

This paper presents a paradigm, called ShapeShifting TV, for the realisation of interactive TV narratives or, more generally, of interactive screen-media narratives. These are productions whose narrations respond on the fly (i.e. in real time) to ...
Interactive TV narratives: Opportunities, progress, and challenges

This article is motivated by the question whether television should do more than simply offer interactive services alongside (and separately from) traditional linear programs, in the context of its dominance being seriously challenged and threatened by ...
Community TV: a new dimension for immersive social networking
MoMM '08: Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia

The TV sector is facing many drastic changes and is witnessing a rapid transformation. New technologies such as interactive TV, High Definition TV, very high capacity Personal Video Recorder (HD-DVB, Blu-Ray mobile storage), Home Theatre Systems, Video ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
310
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tedjopurnomo DBao ZZheng BChoudhury FQin A(2020)A Survey on Modern Deep Neural Network for Traffic Prediction: Trends, Methods and ChallengesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3001195(1-1)Online publication date: 2020
https://doi.org/10.1109/TKDE.2020.3001195
Ture FRao JTang RLin JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment PlatformsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331433(1375-1376)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331433
Tang RTure FLin JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Yelling at Your TVProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331271(853-856)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331271
Popovici IVatavu RWu W(2019)TV Channels in Your Pocket! Linking Smart Pockets to Smart TVsProceedings of the 2019 ACM International Conference on Interactive Experiences for TV and Online Video10.1145/3317697.3325119(193-198)Online publication date: 4-Jun-2019
https://dl.acm.org/doi/10.1145/3317697.3325119
Popovici ISchipor OVatavu R(2019)HoverInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2019.03.012129:C(95-107)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.ijhcs.2019.03.012
Rao JTure FLin JGuo YFarooq F(2018)Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment PlatformProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219870(636-645)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3219870
Rao JTure FLin JCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)What Do Viewers Say to Their TVs?The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210140(1213-1216)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210140

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten