research-article

Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering

Authors:
Jong-Hoon Oh

National Institute of Information and Communications Technology, Kyoto, Japan

National Institute of Information and Communications Technology, Kyoto, Japan
View Profile

,
Kentaro Torisawa

National Institute of Information and Communications Technology, Kyoto, Japan

National Institute of Information and Communications Technology, Kyoto, Japan
View Profile

,
Canasai Kruengkrai

National Institute of Information and Communications Technology, Kyoto, Japan

National Institute of Information and Communications Technology, Kyoto, Japan
View Profile

,
Ryu Iida

National Institute of Information and Communications Technology, Kyoto, Japan

National Institute of Information and Communications Technology, Kyoto, Japan
View Profile

,
Julien Kloetzer

National Institute of Information and Communications Technology, Kyoto, Japan

National Institute of Information and Communications Technology, Kyoto, Japan
View Profile

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningFebruary 2017Pages 415–424https://doi.org/10.1145/3018661.3018737

Published:02 February 2017Publication History

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 415–424

ABSTRACT

Why-question answering (why-QA) is a task to retrieve answers (or answer passages) to why-questions (e.g., "why are tsunamis generated?") from a text archive. Several previously proposed methods for why-QA improved their performance by automatically recognizing causalities that are expressed with such explicit cues as "because" in answer passages and using the recognized causalities as a clue for finding proper answers. However, in answer passages, causalities might be implicitly expressed, (i.e., without any explicit cues): "An earthquake suddenly displaced sea water and a tsunami was generated." The previous works did not deal with such implicitly expressed causalities and failed to find proper answers that included the causalities. We improve why-QA based on the following two ideas. First, implicitly expressed causalities in one text might be expressed in other texts with explicit cues. If we can automatically recognize such explicitly expressed causalities from a text archive and use them to complement the implicitly expressed causalities in an answer passage, we can improve why-QA. Second, the causes of similar events tend to be described with a similar set of words (e.g., "seismic energy" and "tectonic plates" for "the Great East Japan Earthquake" and "the 1906 San Francisco Earthquake"). As such, even if we cannot find in a text archive any explicitly expressed cause of an event (e.g., "the Great East Japan Earthquake") expressed in a question (e.g., "Why did the Great East Japan earthquake happen?"), we might be able to identify its implicitly expressed causes with a set of words (e.g., "tectonic plates") that appear in the explicitly expressed cause of a similar event (e.g., "the 1906 San Francisco Earthquake").

We implemented these two ideas in our multi-column convolutional neural networks with a novel attention mechanism, which we call causality attention. Through experiments on Japanese why-QA, we confirmed that our proposed method outperformed the state-of-the-art systems.

References

J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. In Proceedings of ICLR, 2015.Google Scholar
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR, 2015.Google Scholar
F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.Google Scholar
G. Bouma. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference, pages 31--40, 2009.Google Scholar
J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio. Attention-based models for speech recognition. In Proceedings of NIPS, pages 577--585, 2015.Google Scholar
D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proceedings of CVPR, pages 3642--3649, 2012. Google ScholarCross Ref
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493--2537, Nov. 2011.Google ScholarCross Ref
Q. X. Do, Y. S. Chan, and D. Roth. Minimally supervised event causality identi cation. In Proceedings of EMNLP, pages 294--303, 2011.Google Scholar
M. Feng, B. Xiang, M. R. Glass, L. Wang, and B. Zhou. Applying deep learning to answer selection: A study and an open task. In Proceedings of ASRU, pages 813--820, 2015. Google ScholarCross Ref
R. Girju. Automatic detection of causal relations for question answering. In Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, pages 76--83, 2003. Google ScholarDigital Library
A. S. Gordon, C. A. Bejan, and K. Sagae. Commonsense causal reasoning using millions of personal stories. In Proceedings of AAAI, pages 1180--1185, 2011.Google Scholar
C. Hashimoto, K. Torisawa, J. Kloetzer, and J. Oh. Generating event causality hypotheses through semantic relations. In Proceedings of AAAI, pages 2396--2403, 2015.Google Scholar
C. Hashimoto, K. Torisawa, J. Kloetzer, M. Sano, I. Varga, J. Oh, and Y. Kidawara. Toward future scenario generation: Extracting event causality exploiting semantic relation, context, and association features. In Proceedings of ACL, pages 987--997, 2014. Google ScholarCross Ref
K. M. Hermann, T. Kočisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom. Teaching machines to read and comprehend. In Proceedings of NIPS, 2015.Google ScholarDigital Library
R. Higashinaka and H. Isozaki. Corpus-based question answering for why-questions. In Proceedings of IJCNLP, pages 418--425, 2008.Google Scholar
R. Iida, K. Torisawa, J.-H. Oh, C. Kruengkrai, and J. Kloetzer. Intra-sentential subject zero anaphora resolution using multi-column convolutional neural network. In Proceedings of EMNLP, pages 1244--1254, 2016. Google ScholarCross Ref
J. Kazama and K. Torisawa. Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In Proceedings of ACL-HLT, pages 407--415, 2008.Google Scholar
Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of EMNLP, pages 1746--1751, 2014. Google ScholarCross Ref
Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. Character-aware neural language models. In Proceedings of AAAI, pages 2741--2749, 2016.Google Scholar
C. Kruengkrai, K. Torisawa, C. Hashimoto, J. Kloetzer, J.-H. Oh, and M. Tanaka. Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks. In Proceedings of AAAI, 2017.Google Scholar
J. La erty, A. McCallum, and F. Pereira. Conditional random elds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML, pages 282--289, 2001.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS, pages 3111--3119, 2013.Google ScholarDigital Library
V. Mnih, N. Heess, A. Graves, and k. kavukcuoglu. Recurrent models of visual attention. In Proceedings of NIPS, pages 2204--2212, 2014Google ScholarDigital Library
M. Murata, S. Tsukawaki, T. Kanamaru, Q. Ma, and H. Isahara. A system for answering non-factoid Japanese questions by using passage retrieval weighted based on type of answer. In Proceedings of NTCIR-6, 2007.Google Scholar
J.-H. Oh, K. Torisawa, C. Hashimoto, R. Iida, M. Tanaka, and J. Kloetzer. A semi-supervised learning approach to why-question answering. In Proceedings of AAAI, pages 3022--3029, 2016.Google Scholar
J.-H. Oh, K. Torisawa, C. Hashimoto, T. Kawada, S. D. Saeger, J. Kazama, and Y. Wang. Why question answering using sentiment analysis and word classes. In Proceedings of EMNLP-CoNLL, pages 368--378, 2012.Google Scholar
J.-H. Oh, K. Torisawa, C. Hashimoto, M. Sano, S. D. Saeger, and K. Ohtake. Why-question answering using intra- and inter-sentential causal relations. In Proceedings of ACL, pages 1733--1743, 2013.Google Scholar
A. M. Rush, S. Chopra, and J. Weston. A neural attention model for abstractive sentence summarization. In Proceedings of EMNLP, pages 379--389, 2015.Google ScholarCross Ref
A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of SIGIR, pages 373--382, 2015. Google ScholarDigital Library
S. Sukhbaatar, a. szlam, J. Weston, and R. Fergus. End-to-end memory networks. In Proceedings of NIPS, pages 2440--2448, 2015.Google Scholar
M. Tan, B. Xiang, and B. Zhou. Lstm-based deep learning models for non-factoid answer selection. CoRR, abs/1511.04108, 2015.Google Scholar
S. Verberne, H. van Halteren, D. Theijssen, S. Raaijmakers, and L. Boves. Learning to rank for why-question answering. Inf. Retr., 14(2):107--132, 2011. Google ScholarDigital Library
D. Wang and E. Nyberg. A long short-term memory model for answer sentence selection in question answering. In Proceedings of ACL-IJCNLP, pages 707--712, 2015. Google ScholarCross Ref
W. Yin, H. Schütze, B. Xiang, and B. Zhou. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics, 4:259--272, 2016.Google ScholarCross Ref
L. Yu, K. M. Hermann, P. Blunsom, and S. Pulman. Deep learning for answer sentence selection. In NIPS Deep Learning Workshop, 2014.Google Scholar
M. D. Zeiler. ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701, 2012.Google Scholar
D. Zeng, K. Liu, Y. Chen, and J. Zhao. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of EMNLP, pages 1753--1762, 2015. Google ScholarCross Ref

Index Terms

Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering

Recommendations

Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Read More
Exploring Diversification In Non-factoid Question Answering
ICTIR '18: Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval

Retrieving short, precise answers to non-factoid queries is an increasingly important task, especially for mobile and voice search. Many of these questions may have multiple or alternative answers. In an environment where answers are presented ...
Read More
A Hierarchical Attention Retrieval Model for Healthcare Question Answering
WWW '19: The World Wide Web Conference

The growth of the Web in recent years has resulted in the development of various online platforms that provide healthcare information services. These platforms contain an enormous amount of information, which could be beneficial for a large number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
causality
convolutional neural network
neural attention
question answering
why-question answering
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 857
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Quality-aware collaborative question answering: methods and evaluation

Exploring Diversification In Non-factoid Question Answering

A Hierarchical Attention Retrieval Model for Healthcare Question Answering