research-article

Multi-Cast Attention Networks

Authors:

Siu Cheung HuiAuthors Info & Claims

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2299 - 2308

https://doi.org/10.1145/3219819.3220048

Published: 19 July 2018 Publication History

Abstract

Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by 9%. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[2]

Alberto Barrón-Cedeño, Giovanni Da San Martino, Shafiq R. Joty, Alessandro Moschitti, Fahad Al-Obaidli, Salvatore Romeo, Kateryna Tymoshenko, and Antonio Uva. 2016. ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016.

[3]

Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017.

[4]

Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[5]

Cícero Nogueira dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive Pooling Networks. CoRR abs/1602.03609 (2016).

[6]

Simone Filice, Danilo Croce, Alessandro Moschitti, and Roberto Basili. 2016. KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers. In Proceedings of the 10th InternationalWorkshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, 2016.

[7]

Hua He, Kevin Gimpel, and Jimmy J. Lin. 2015. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015.

[8]

Hua He and Jimmy J. Lin. 2016. Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12--17, 2016. 937--948.

[9]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014.

Digital Library

[10]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information &knowledge management.

Digital Library

[11]

Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A Position-Aware Neural IR Model for Relevance Matching. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1049--1058.

[12]

Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model. arXiv preprint arXiv:1706.10192 (2017).

[13]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).

[14]

Pengfei Liu, Xipeng Qiu, Yaqian Zhou, Jifan Chen, and Xuanjing Huang. 2016. Modelling Interaction of Sentence Pair with Coupled-LSTMs. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016. 1703--1712.

[15]

Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909 (2015).

[16]

Zhengdong Lu and Hang Li. 2013. A deep architecture for matching short texts. In Advances in Neural Information Processing Systems.

Digital Library

[17]

Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web.

Digital Library

[18]

Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016.

Digital Library

[19]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching as Image Recognition.

[20]

Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016.

[21]

Xipeng Qiu and Xuanjing Huang. 2015. Convolutional Neural Tensor Network Architecture for Community-Based Question Answering. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015.

Digital Library

[22]

Jinfeng Rao, Hua He, and Jimmy J. Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016.

Digital Library

[23]

Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, and Jimmy Lin. 2018. Multi- Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search. (2018). arXiv:arXiv:1805.08159

[24]

Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000.

Digital Library

[25]

Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kocisky

[26]

y, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015).

[27]

Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016).

[28]

Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval,2015.

Digital Library

[29]

Gehui Shen, Yunlun Yang, and Zhi-Hong Deng. 2017. Inter-Weighted Alignment Network for Sentence Pair Modeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017.

[30]

Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.

Digital Library

[31]

Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web.

Digital Library

[32]

Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. CoRR abs/1505.00387 (2015). arXiv:1505.00387 http://arxiv.org/abs/ 1505.00387

[33]

Yi Tay, Minh C. Phan, Anh Tuan Luu, and Siu Cheung Hui. 2017. Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.

Digital Library

[34]

Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2017. A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference. arXiv preprint arXiv:1801.00102 (2017).

[35]

Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2017. Cross Temporal Recurrent Networks for Ranking Question Answer Pairs. (2017). arXiv:arXiv:1711.07656

[36]

Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering. In Proceedings of WSDM (WSDM '18).

Digital Library

[37]

Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

[38]

Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-srnn: Modeling the recursive matching structure with spatial rnn. arXiv preprint arXiv:1604.04378 (2016).

Digital Library

[39]

Bingning Wang, Kang Liu, and Jun Zhao. 2016. Inner Attention based Recurrent Neural Networks for Answer Selection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016.

[40]

Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015.

[41]

Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA. In EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.

[42]

ShuohangWang and Jing Jiang. 2016. A Compare-Aggregate Model for Matching Text Sequences. CoRR abs/1611.01747 (2016). arXiv:1611.01747 http://arxiv.org/ abs/1611.01747

[43]

ShuohangWang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905 (2016).

[44]

Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral Multi-Perspective Matching for Natural Language Sentences. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017.

Digital Library

[45]

Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. arXiv preprint arXiv:1602.07019 (2016).

[46]

YuWu,WeiWu, Zhoujun Li, and Ming Zhou. 2016. Knowledge Enhanced Hybrid Neural Network for Text Matching. arXiv preprint arXiv:1611.04684 (2016).

[47]

Yu Wu, Wei Wu, Chen Xing, Ming Zhou, and Zhoujun Li. 2016. Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots. arXiv preprint arXiv:1612.01627 (2016).

[48]

Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-Entity Duet Representations for Document Ranking. arXiv preprint arXiv:1706.06636 (2017).

Digital Library

[49]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research &Development in Information Retrieval. ACM.

Digital Library

[50]

Caiming Xiong, Victor Zhong, and Richard Socher. 2016. Dynamic Coattention Networks For Question Answering. CoRR abs/1611.01604 (2016).

[51]

Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, and Xiaolong Wang. 2016. Incorporating Loose-Structured Knowledge into LSTM with Recall Gate for Conversation Modeling. arXiv preprint arXiv:1605.05110 (2016).

[52]

Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016. 287--296.

Digital Library

[53]

Peng Zhang, Jiabin Niu, Zhan Su, Benyou Wang, Liqun Ma, and Dawei Song. 2018. End-to-End Quantum-like Language Models with Application to Question Answering. (2018).

[54]

Xiaodong Zhang, Sujian Li, Lei Sha, and Houfeng Wang. 2017. Attentive Interactive Neural Networks for Answer Selection in Community Question Answering. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017.

Digital Library

Cited By

Chen GCheng XChen JShe XQin JChen J(2024)Event assigning based on hierarchical features and enhanced association for Chinese mayor's hotlineComputational Intelligence10.1111/coin.1262640:1Online publication date: 4-Jan-2024
https://doi.org/10.1111/coin.12626
Fabre RAzeroual OBellot PSchöpfel JEgret D(2022)Retrieving Adversarial Cliques in Cognitive Communities: A New Conceptual Framework for Scientific Knowledge GraphsFuture Internet10.3390/fi1409026214:9(262)Online publication date: 7-Sep-2022
https://doi.org/10.3390/fi14090262
Li YWang ZYu J(2022)Densely Enhanced Semantic Network for Conversation System in Social MediaACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350179918:4(1-24)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3501799
Show More Cited By

Index Terms

Multi-Cast Attention Networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
    2. Retrieval tasks and goals
      1. Question answering

Recommendations

Multi-Pointer Co-Attention Networks for Recommendation
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Many recent state-of-the-art recommender systems such as D-ATT, TransNet and DeepCoNN exploit reviews for representation learning. This paper proposes a new neural architecture for recommendation with reviews. Our model operates on a multi-hierarchical ...
Knowledge-aware Attentive Neural Network for Ranking Question Answer Pairs
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Ranking question answer pairs has attracted increasing attention recently due to its broad applications such as information retrieval and question answering (QA). Significant progresses have been made by deep neural networks. However, background ...
Attention and Memory-Augmented Networks for Dual-View Sequential Learning
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

In recent years, sequential learning has been of great interest due to the advance of deep learning with applications in time-series forecasting, natural language processing, and speech recognition. Recurrent neural networks (RNNs) have achieved superior ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2018

2925 pages

ISBN:9781450355520

DOI:10.1145/3219819

General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '18

Sponsor:

KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 19 - 23, 2018

London, United Kingdom

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
849
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen GCheng XChen JShe XQin JChen J(2024)Event assigning based on hierarchical features and enhanced association for Chinese mayor's hotlineComputational Intelligence10.1111/coin.1262640:1Online publication date: 4-Jan-2024
https://doi.org/10.1111/coin.12626
Fabre RAzeroual OBellot PSchöpfel JEgret D(2022)Retrieving Adversarial Cliques in Cognitive Communities: A New Conceptual Framework for Scientific Knowledge GraphsFuture Internet10.3390/fi1409026214:9(262)Online publication date: 7-Sep-2022
https://doi.org/10.3390/fi14090262
Li YWang ZYu J(2022)Densely Enhanced Semantic Network for Conversation System in Social MediaACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350179918:4(1-24)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3501799
Wang NChen RDu K(2022)A Study of Answer Selection Task Based on Deep Learning Methods2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00100(479-487)Online publication date: Apr-2022
https://doi.org/10.1109/AEMCSE55572.2022.00100
Zhou XWu OJiang C(2022)Increasing naturalness of human–machine dialogueKnowledge-Based Systems10.1016/j.knosys.2022.108485243:COnline publication date: 11-May-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.108485
Lu WZhao PLi YWang SHuang HShi SWu H(2022)Chinese sentence semantic matching based on multi-level relevance extraction and aggregation for intelligent human–robot interactionApplied Soft Computing10.1016/j.asoc.2022.109795131(109795)Online publication date: Dec-2022
https://doi.org/10.1016/j.asoc.2022.109795
Chang GWang WHu S(2022)MatchACNN: A Multi-Granularity Deep Matching ModelNeural Processing Letters10.1007/s11063-022-11047-655:4(4419-4438)Online publication date: 12-Oct-2022
https://doi.org/10.1007/s11063-022-11047-6
Abbasiantaeb ZMomtazi S(2022)Entity-aware answer sentence selection for question answering with transformer-based language modelsJournal of Intelligent Information Systems10.1007/s10844-022-00724-659:3(755-777)Online publication date: 9-Jul-2022
https://doi.org/10.1007/s10844-022-00724-6
Abdel-Nabi HAwajan AAli M(2022)Deep learning-based question answering: a surveyKnowledge and Information Systems10.1007/s10115-022-01783-565:4(1399-1485)Online publication date: 30-Dec-2022
https://doi.org/10.1007/s10115-022-01783-5
Peng YWang WBao F(2022)Interactive Mongolian Question Answer Matching Model Based on Attention Mechanism in the Law DomainChinese Computational Linguistics10.1007/978-3-031-18315-7_15(229-244)Online publication date: 6-Oct-2022
https://doi.org/10.1007/978-3-031-18315-7_15
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten