research-article

Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform

Authors:

Jimmy LinAuthors Info & Claims

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 636 - 645

https://doi.org/10.1145/3219819.3219870

Published: 19 July 2018 Publication History

Abstract

We tackle the challenge of understanding voice queries posed against the Comcast Xfinity X1 entertainment platform, where consumers direct speech input at their "voice remotes". Such queries range from specific program navigation (i.e., watch a movie) to requests with vague intents and even queries that have nothing to do with watching TV. We present successively richer neural network architectures to tackle this challenge based on two key insights: The first is that session context can be exploited to disambiguate queries and recover from ASR errors, which we operationalize with hierarchical recurrent neural networks. The second insight is that query understanding requires evidence integration across multiple related tasks, which we identify as program prediction, intent classification, and query tagging. We present a novel multi-task neural architecture that jointly learns to accomplish all three tasks. Our initial model, already deployed in production, serves millions of queries daily with an improved customer experience. The novel multi-task learning model, first described here, is evaluated through carefully-controlled laboratory experiments, which demonstrates further gains in effectiveness and increased system capabilities.

References

[1]

R. Caruana. 1997. Multitask Learning. Machine Learning (1997), 41--75.

Digital Library

[2]

O. Chapelle and Y. Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. WWW. 1--10.

Digital Library

[3]

C. Chelba and J. Schalkwyk. 2013. Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search. Mobile Speech and Advanced Natural Language Solutions.

[4]

R. Collobert and J. Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML. 160--167.

Digital Library

[5]

M. Dundar, Q. Kou, B. Zhang, Y. He, and B. Rajwa. 2015. Simplicity of K-means Versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data. ICMLA. 883--888.

[6]

J. Feng and S. Bangalore. 2009. Effects of Word Confusion Networks on Voice Search. EACL. 238--245.

Digital Library

[7]

J. Finkel, T. Grenager, and C. Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL. 363--370.

Digital Library

[8]

J. Guo, Y. Fan, Q. Ai, and B. Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. CIKM. 55--64.

Digital Library

[9]

I. Guy. 2016. Searching by Talking: Analysis of Voice Queries on Mobile Web Search. SIGIR. 35--44.

Digital Library

[10]

A. Hassan, R. Kulkarni, U. Ozertem, and R. Jones. 2015. Characterizing and Predicting Voice Query Reformulation. CIKM. 543--552.

Digital Library

[11]

H. He, J. Wieting, K. Gimpel, J. Rao, and J. Lin. 2016. UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement. SemEval. 1103--1108.

[12]

P. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. CIKM. 2333--2338.

Digital Library

[13]

J. Jiang, W. Jeng, and D. He. 2013. How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice Search. SIGIR. 143--152.

Digital Library

[14]

T. Joachims. 2006. Training Linear SVMs in Linear Time. SIGKDD. 217--226.

Digital Library

[15]

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv:1607.01759.

[16]

J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML. 282--289.

Digital Library

[17]

P. Liu, X. Qiu, and X. Huang. 2017. Adversarial Multi-task Learning for Text Classification. arXiv:1704.05742.

[18]

M.-T. Luong, Q. Le, I. Sutskever, O. Vinyals, and L. Kaiser. 2015. Multi-task Sequence to Sequence Learning. arXiv:1511.06114.

[19]

T. Mikolov, W. Yih, and G. Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. HLT/NAACL. 746--751.

[20]

R. Pasunuru and M. Bansal. 2017. Multi-Task Video Captioning with Video and Entailment Generation. arXiv:1704.07489.

[21]

J. Rao, H. He, and J. Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks. CIKM. 1913--1916.

Digital Library

[22]

J. Rao, H. He, and J. Lin. 2017. Experiments with Convolutional Neural Network Models for Answer Selection. SIGIR. 1217--1220.

Digital Library

[23]

J. Rao, H. He, H. Zhang, F. Ture, R. Sequiera, S. Mohammed, and J. Lin. 2017. Integrating Lexical and Temporal Signals in Neural Ranking Models for Social Media Search. Neu-IR.

[24]

J. Rao, F. Ture, H. He, O. Jojic, and J. Lin. 2017. Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks. CIKM. 557--566.

Digital Library

[25]

J. Rao, F. Ture, and J. Lin. 2018. What Do Users Say to Their TVs? An Analysis of Voice Queries to an Entertainment System. SIGIR.

Digital Library

[26]

J. Rao, W. Yang, Y. Zhang, F. Ture, and J. Lin. 2018. Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search. arXiv:1805.08159.

[27]

R. Sequiera, G. Baruah, Z. Tu, S. Mohammed, J. Rao, H. Zhang, and J. Lin. 2017. Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering. Neu-IR.

[28]

J. Shan, G. Wu, Z. Hu, X. Tang, M. Jansche, and P. Moreno. 2010. Search by Voice in Mandarin Chinese. INTERSPEECH. 354--357.

[29]

M. Shokouhi, U. Ozertem, and N. Craswell. 2016. Did You Say U2 or YouTube? Inferring Implicit Transcripts from Voice Search Logs. WWW. 1215--1224.

Digital Library

[30]

M. Smucker, J. Allan, and B. Carterette. 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. CIKM. 623--632.

Digital Library

[31]

L. Wang, J. Lin, and D. Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. SIGIR. 105--114.

Digital Library

[32]

Y. Wang, D. Yu, Y. Ju, and A. Acero. 2008. An Introduction to Voice Search. IEEE Signal Processing Magazine, 29--38.

[33]

R. Yu, A. Li, V. Morariu, and L. Davis. 2017. Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation. ICCV. 1974--1982.

[34]

R. Yu, H. Wang, and L. Davis. 2018. ReMotENet: Efficient Relevant Motion Event Detection for Large-Scale Home Surveillance Videos. WACV.

[35]

B. Zhang and M. Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. CIKM. 1239--1248.

Digital Library

[36]

Y. Zhang and Q. Yang. 2017. A Survey on Multi-Task Learning. arXiv:1707.08114.

Cited By

Mohaimin IApong RDamit A(2023)Part-of-Speech (POS) Tagging for Standard Brunei Malay: A Probabilistic and Neural-Based ApproachJournal of Advances in Information Technology10.12720/jait.14.4.830-83714:4(830-837)Online publication date: 2023
https://doi.org/10.12720/jait.14.4.830-837
Peng ZDave VMcNabb NSharnagat RMagnani ALiao CFang YRajanala SSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Entity-aware Multi-task Learning for Query Understanding at WalmartProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599816(4733-4742)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599816
Ma HGuo HLau V(2023)Communication-Efficient Federated Multitask Learning Over Wireless NetworksIEEE Internet of Things Journal10.1109/JIOT.2022.320131010:1(609-624)Online publication date: 1-Jan-2023
https://doi.org/10.1109/JIOT.2022.3201310
Show More Cited By

Index Terms

Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Speech / audio search

Recommendations

Multi-Adaptive Optimization for multi-task learning with deep neural networks
Abstract
Multi-task learning is a promising paradigm to leverage task interrelations during the training of deep neural networks. A key challenge in the training of multi-task networks is to adequately balance the complementary supervisory signals of ...
Fast multi-task learning for query spelling correction
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

In this paper, we explore the use of a novel online multi-task learning framework for the task of search query spelling correction. In our procedure, correction candidates are initially generated by a ranker-based system and then re-ranked by our multi-...
Multi-Task Learning With Multi-Query Transformer for Dense Prediction
Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts for each task. The core insight beyond these methods is to maximize the mutual effects ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2018

2925 pages

ISBN:9781450355520

DOI:10.1145/3219819

General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '18

Sponsor:

KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 19 - 23, 2018

London, United Kingdom

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
572
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)4

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mohaimin IApong RDamit A(2023)Part-of-Speech (POS) Tagging for Standard Brunei Malay: A Probabilistic and Neural-Based ApproachJournal of Advances in Information Technology10.12720/jait.14.4.830-83714:4(830-837)Online publication date: 2023
https://doi.org/10.12720/jait.14.4.830-837
Peng ZDave VMcNabb NSharnagat RMagnani ALiao CFang YRajanala SSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Entity-aware Multi-task Learning for Query Understanding at WalmartProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599816(4733-4742)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599816
Ma HGuo HLau V(2023)Communication-Efficient Federated Multitask Learning Over Wireless NetworksIEEE Internet of Things Journal10.1109/JIOT.2022.320131010:1(609-624)Online publication date: 1-Jan-2023
https://doi.org/10.1109/JIOT.2022.3201310
Li YChe XHuang YWang JWang SWang YWang Q(2022)A Tale of Two Tasks: Automated Issue Priority Prediction with Deep Multi-task LearningProceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3544902.3546257(1-11)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1145/3544902.3546257
Chen QDu JAllot ALu Z(2022)LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature CurationIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.317356219:5(2584-2595)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TCBB.2022.3173562
Tomasi FMehrotra RPappu ABütepage JBrost BGalvão HLalmas Md'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Query Understanding for Surfacing Under-served Music ContentProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412741(2765-2772)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412741
Li MShi LYang YWang QGrundy JLe Goues CLo D(2020)A deep multitask learning approach for requirements discovery and annotation from open forumProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416627(336-348)Online publication date: 21-Dec-2020
https://dl.acm.org/doi/10.1145/3324884.3416627
Kavmini LDinushika TThayasivam UJayasena S(2020)Improved Speech Command Classification System for Sinhala Language based on Automatic Speech RecognitionInternational Journal of Asian Language Processing10.1142/S271755452050009530:02(2050009)Online publication date: 26-Sep-2020
https://doi.org/10.1142/S2717554520500095
Tian BZhang YWang JXing C(2019)Hierarchical inter-attention network for document classification with multi-task learningProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367537(3569-3575)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367537
Ture FRao JTang RLin JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment PlatformsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331433(1375-1376)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331433
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten