skip to main content
10.1145/3219819.3220048acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Multi-Cast Attention Networks

Published: 19 July 2018 Publication History

Abstract

Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by 9%. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[2]
Alberto Barrón-Cedeño, Giovanni Da San Martino, Shafiq R. Joty, Alessandro Moschitti, Fahad Al-Obaidli, Salvatore Romeo, Kateryna Tymoshenko, and Antonio Uva. 2016. ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016.
[3]
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017.
[4]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[5]
Cícero Nogueira dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive Pooling Networks. CoRR abs/1602.03609 (2016).
[6]
Simone Filice, Danilo Croce, Alessandro Moschitti, and Roberto Basili. 2016. KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers. In Proceedings of the 10th InternationalWorkshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, 2016.
[7]
Hua He, Kevin Gimpel, and Jimmy J. Lin. 2015. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015.
[8]
Hua He and Jimmy J. Lin. 2016. Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12--17, 2016. 937--948.
[9]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014.
[10]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information &knowledge management.
[11]
Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A Position-Aware Neural IR Model for Relevance Matching. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1049--1058.
[12]
Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model. arXiv preprint arXiv:1706.10192 (2017).
[13]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).
[14]
Pengfei Liu, Xipeng Qiu, Yaqian Zhou, Jifan Chen, and Xuanjing Huang. 2016. Modelling Interaction of Sentence Pair with Coupled-LSTMs. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016. 1703--1712.
[15]
Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909 (2015).
[16]
Zhengdong Lu and Hang Li. 2013. A deep architecture for matching short texts. In Advances in Neural Information Processing Systems.
[17]
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web.
[18]
Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[19]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching as Image Recognition.
[20]
Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016.
[21]
Xipeng Qiu and Xuanjing Huang. 2015. Convolutional Neural Tensor Network Architecture for Community-Based Question Answering. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015.
[22]
Jinfeng Rao, Hua He, and Jimmy J. Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016.
[23]
Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, and Jimmy Lin. 2018. Multi- Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search. (2018). arXiv:arXiv:1805.08159
[24]
Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000.
[25]
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kocisky
[26]
y, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015).
[27]
Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016).
[28]
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval,2015.
[29]
Gehui Shen, Yunlun Yang, and Zhi-Hong Deng. 2017. Inter-Weighted Alignment Network for Sentence Pair Modeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017.
[30]
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.
[31]
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web.
[32]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. CoRR abs/1505.00387 (2015). arXiv:1505.00387 http://arxiv.org/abs/ 1505.00387
[33]
Yi Tay, Minh C. Phan, Anh Tuan Luu, and Siu Cheung Hui. 2017. Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.
[34]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2017. A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference. arXiv preprint arXiv:1801.00102 (2017).
[35]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2017. Cross Temporal Recurrent Networks for Ranking Question Answer Pairs. (2017). arXiv:arXiv:1711.07656
[36]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering. In Proceedings of WSDM (WSDM '18).
[37]
Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.
[38]
Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-srnn: Modeling the recursive matching structure with spatial rnn. arXiv preprint arXiv:1604.04378 (2016).
[39]
Bingning Wang, Kang Liu, and Jun Zhao. 2016. Inner Attention based Recurrent Neural Networks for Answer Selection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016.
[40]
Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015.
[41]
Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA. In EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.
[42]
ShuohangWang and Jing Jiang. 2016. A Compare-Aggregate Model for Matching Text Sequences. CoRR abs/1611.01747 (2016). arXiv:1611.01747 http://arxiv.org/ abs/1611.01747
[43]
ShuohangWang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905 (2016).
[44]
Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral Multi-Perspective Matching for Natural Language Sentences. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017.
[45]
Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. arXiv preprint arXiv:1602.07019 (2016).
[46]
YuWu,WeiWu, Zhoujun Li, and Ming Zhou. 2016. Knowledge Enhanced Hybrid Neural Network for Text Matching. arXiv preprint arXiv:1611.04684 (2016).
[47]
Yu Wu, Wei Wu, Chen Xing, Ming Zhou, and Zhoujun Li. 2016. Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots. arXiv preprint arXiv:1612.01627 (2016).
[48]
Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-Entity Duet Representations for Document Ranking. arXiv preprint arXiv:1706.06636 (2017).
[49]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research &Development in Information Retrieval. ACM.
[50]
Caiming Xiong, Victor Zhong, and Richard Socher. 2016. Dynamic Coattention Networks For Question Answering. CoRR abs/1611.01604 (2016).
[51]
Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, and Xiaolong Wang. 2016. Incorporating Loose-Structured Knowledge into LSTM with Recall Gate for Conversation Modeling. arXiv preprint arXiv:1605.05110 (2016).
[52]
Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016. 287--296.
[53]
Peng Zhang, Jiabin Niu, Zhan Su, Benyou Wang, Liqun Ma, and Dawei Song. 2018. End-to-End Quantum-like Language Models with Application to Question Answering. (2018).
[54]
Xiaodong Zhang, Sujian Li, Lei Sha, and Houfeng Wang. 2017. Attentive Interactive Neural Networks for Answer Selection in Community Question Answering. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017.

Cited By

View all
  • (2024)Event assigning based on hierarchical features and enhanced association for Chinese mayor's hotlineComputational Intelligence10.1111/coin.1262640:1Online publication date: 4-Jan-2024
  • (2022)Retrieving Adversarial Cliques in Cognitive Communities: A New Conceptual Framework for Scientific Knowledge GraphsFuture Internet10.3390/fi1409026214:9(262)Online publication date: 7-Sep-2022
  • (2022)Densely Enhanced Semantic Network for Conversation System in Social MediaACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350179918:4(1-24)Online publication date: 4-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention mechanism
  2. co-attention
  3. conversation modeling
  4. deep learning
  5. information retrieval
  6. intra-attention
  7. learning to rank
  8. neural networks
  9. neural ranking models
  10. qa
  11. question answering

Qualifiers

  • Research-article

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Event assigning based on hierarchical features and enhanced association for Chinese mayor's hotlineComputational Intelligence10.1111/coin.1262640:1Online publication date: 4-Jan-2024
  • (2022)Retrieving Adversarial Cliques in Cognitive Communities: A New Conceptual Framework for Scientific Knowledge GraphsFuture Internet10.3390/fi1409026214:9(262)Online publication date: 7-Sep-2022
  • (2022)Densely Enhanced Semantic Network for Conversation System in Social MediaACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350179918:4(1-24)Online publication date: 4-Mar-2022
  • (2022)A Study of Answer Selection Task Based on Deep Learning Methods2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00100(479-487)Online publication date: Apr-2022
  • (2022)Increasing naturalness of human–machine dialogueKnowledge-Based Systems10.1016/j.knosys.2022.108485243:COnline publication date: 11-May-2022
  • (2022)Chinese sentence semantic matching based on multi-level relevance extraction and aggregation for intelligent human–robot interactionApplied Soft Computing10.1016/j.asoc.2022.109795131(109795)Online publication date: Dec-2022
  • (2022)MatchACNN: A Multi-Granularity Deep Matching ModelNeural Processing Letters10.1007/s11063-022-11047-655:4(4419-4438)Online publication date: 12-Oct-2022
  • (2022)Entity-aware answer sentence selection for question answering with transformer-based language modelsJournal of Intelligent Information Systems10.1007/s10844-022-00724-659:3(755-777)Online publication date: 9-Jul-2022
  • (2022)Deep learning-based question answering: a surveyKnowledge and Information Systems10.1007/s10115-022-01783-565:4(1399-1485)Online publication date: 30-Dec-2022
  • (2022)Interactive Mongolian Question Answer Matching Model Based on Attention Mechanism in the Law DomainChinese Computational Linguistics10.1007/978-3-031-18315-7_15(229-244)Online publication date: 6-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media