skip to main content
10.1145/3219819.3219870acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform

Published: 19 July 2018 Publication History

Abstract

We tackle the challenge of understanding voice queries posed against the Comcast Xfinity X1 entertainment platform, where consumers direct speech input at their "voice remotes". Such queries range from specific program navigation (i.e., watch a movie) to requests with vague intents and even queries that have nothing to do with watching TV. We present successively richer neural network architectures to tackle this challenge based on two key insights: The first is that session context can be exploited to disambiguate queries and recover from ASR errors, which we operationalize with hierarchical recurrent neural networks. The second insight is that query understanding requires evidence integration across multiple related tasks, which we identify as program prediction, intent classification, and query tagging. We present a novel multi-task neural architecture that jointly learns to accomplish all three tasks. Our initial model, already deployed in production, serves millions of queries daily with an improved customer experience. The novel multi-task learning model, first described here, is evaluated through carefully-controlled laboratory experiments, which demonstrates further gains in effectiveness and increased system capabilities.

References

[1]
R. Caruana. 1997. Multitask Learning. Machine Learning (1997), 41--75.
[2]
O. Chapelle and Y. Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. WWW. 1--10.
[3]
C. Chelba and J. Schalkwyk. 2013. Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search. Mobile Speech and Advanced Natural Language Solutions.
[4]
R. Collobert and J. Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML. 160--167.
[5]
M. Dundar, Q. Kou, B. Zhang, Y. He, and B. Rajwa. 2015. Simplicity of K-means Versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data. ICMLA. 883--888.
[6]
J. Feng and S. Bangalore. 2009. Effects of Word Confusion Networks on Voice Search. EACL. 238--245.
[7]
J. Finkel, T. Grenager, and C. Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL. 363--370.
[8]
J. Guo, Y. Fan, Q. Ai, and B. Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. CIKM. 55--64.
[9]
I. Guy. 2016. Searching by Talking: Analysis of Voice Queries on Mobile Web Search. SIGIR. 35--44.
[10]
A. Hassan, R. Kulkarni, U. Ozertem, and R. Jones. 2015. Characterizing and Predicting Voice Query Reformulation. CIKM. 543--552.
[11]
H. He, J. Wieting, K. Gimpel, J. Rao, and J. Lin. 2016. UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement. SemEval. 1103--1108.
[12]
P. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. CIKM. 2333--2338.
[13]
J. Jiang, W. Jeng, and D. He. 2013. How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice Search. SIGIR. 143--152.
[14]
T. Joachims. 2006. Training Linear SVMs in Linear Time. SIGKDD. 217--226.
[15]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv:1607.01759.
[16]
J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML. 282--289.
[17]
P. Liu, X. Qiu, and X. Huang. 2017. Adversarial Multi-task Learning for Text Classification. arXiv:1704.05742.
[18]
M.-T. Luong, Q. Le, I. Sutskever, O. Vinyals, and L. Kaiser. 2015. Multi-task Sequence to Sequence Learning. arXiv:1511.06114.
[19]
T. Mikolov, W. Yih, and G. Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. HLT/NAACL. 746--751.
[20]
R. Pasunuru and M. Bansal. 2017. Multi-Task Video Captioning with Video and Entailment Generation. arXiv:1704.07489.
[21]
J. Rao, H. He, and J. Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks. CIKM. 1913--1916.
[22]
J. Rao, H. He, and J. Lin. 2017. Experiments with Convolutional Neural Network Models for Answer Selection. SIGIR. 1217--1220.
[23]
J. Rao, H. He, H. Zhang, F. Ture, R. Sequiera, S. Mohammed, and J. Lin. 2017. Integrating Lexical and Temporal Signals in Neural Ranking Models for Social Media Search. Neu-IR.
[24]
J. Rao, F. Ture, H. He, O. Jojic, and J. Lin. 2017. Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks. CIKM. 557--566.
[25]
J. Rao, F. Ture, and J. Lin. 2018. What Do Users Say to Their TVs? An Analysis of Voice Queries to an Entertainment System. SIGIR.
[26]
J. Rao, W. Yang, Y. Zhang, F. Ture, and J. Lin. 2018. Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search. arXiv:1805.08159.
[27]
R. Sequiera, G. Baruah, Z. Tu, S. Mohammed, J. Rao, H. Zhang, and J. Lin. 2017. Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering. Neu-IR.
[28]
J. Shan, G. Wu, Z. Hu, X. Tang, M. Jansche, and P. Moreno. 2010. Search by Voice in Mandarin Chinese. INTERSPEECH. 354--357.
[29]
M. Shokouhi, U. Ozertem, and N. Craswell. 2016. Did You Say U2 or YouTube? Inferring Implicit Transcripts from Voice Search Logs. WWW. 1215--1224.
[30]
M. Smucker, J. Allan, and B. Carterette. 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. CIKM. 623--632.
[31]
L. Wang, J. Lin, and D. Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. SIGIR. 105--114.
[32]
Y. Wang, D. Yu, Y. Ju, and A. Acero. 2008. An Introduction to Voice Search. IEEE Signal Processing Magazine, 29--38.
[33]
R. Yu, A. Li, V. Morariu, and L. Davis. 2017. Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation. ICCV. 1974--1982.
[34]
R. Yu, H. Wang, and L. Davis. 2018. ReMotENet: Efficient Relevant Motion Event Detection for Large-Scale Home Surveillance Videos. WACV.
[35]
B. Zhang and M. Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. CIKM. 1239--1248.
[36]
Y. Zhang and Q. Yang. 2017. A Survey on Multi-Task Learning. arXiv:1707.08114.

Cited By

View all
  • (2023)Part-of-Speech (POS) Tagging for Standard Brunei Malay: A Probabilistic and Neural-Based ApproachJournal of Advances in Information Technology10.12720/jait.14.4.830-83714:4(830-837)Online publication date: 2023
  • (2023)Entity-aware Multi-task Learning for Query Understanding at WalmartProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599816(4733-4742)Online publication date: 6-Aug-2023
  • (2023)Communication-Efficient Federated Multitask Learning Over Wireless NetworksIEEE Internet of Things Journal10.1109/JIOT.2022.320131010:1(609-624)Online publication date: 1-Jan-2023
  • Show More Cited By

Index Terms

  1. Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2018
    2925 pages
    ISBN:9781450355520
    DOI:10.1145/3219819
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. intelligent agent
    2. intent classification
    3. program prediction
    4. query tagging
    5. speech interface

    Qualifiers

    • Research-article

    Conference

    KDD '18
    Sponsor:

    Acceptance Rates

    KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Part-of-Speech (POS) Tagging for Standard Brunei Malay: A Probabilistic and Neural-Based ApproachJournal of Advances in Information Technology10.12720/jait.14.4.830-83714:4(830-837)Online publication date: 2023
    • (2023)Entity-aware Multi-task Learning for Query Understanding at WalmartProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599816(4733-4742)Online publication date: 6-Aug-2023
    • (2023)Communication-Efficient Federated Multitask Learning Over Wireless NetworksIEEE Internet of Things Journal10.1109/JIOT.2022.320131010:1(609-624)Online publication date: 1-Jan-2023
    • (2022)A Tale of Two Tasks: Automated Issue Priority Prediction with Deep Multi-task LearningProceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3544902.3546257(1-11)Online publication date: 19-Sep-2022
    • (2022)LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature CurationIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.317356219:5(2584-2595)Online publication date: 1-Sep-2022
    • (2020)Query Understanding for Surfacing Under-served Music ContentProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412741(2765-2772)Online publication date: 19-Oct-2020
    • (2020)A deep multitask learning approach for requirements discovery and annotation from open forumProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416627(336-348)Online publication date: 21-Dec-2020
    • (2020)Improved Speech Command Classification System for Sinhala Language based on Automatic Speech RecognitionInternational Journal of Asian Language Processing10.1142/S271755452050009530:02(2050009)Online publication date: 26-Sep-2020
    • (2019)Hierarchical inter-attention network for document classification with multi-task learningProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367537(3569-3575)Online publication date: 10-Aug-2019
    • (2019)Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment PlatformsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331433(1375-1376)Online publication date: 18-Jul-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media