skip to main content
10.1145/3018661.3018737acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering

Authors Info & Claims
Published:02 February 2017Publication History

ABSTRACT

Why-question answering (why-QA) is a task to retrieve answers (or answer passages) to why-questions (e.g., "why are tsunamis generated?") from a text archive. Several previously proposed methods for why-QA improved their performance by automatically recognizing causalities that are expressed with such explicit cues as "because" in answer passages and using the recognized causalities as a clue for finding proper answers. However, in answer passages, causalities might be implicitly expressed, (i.e., without any explicit cues): "An earthquake suddenly displaced sea water and a tsunami was generated." The previous works did not deal with such implicitly expressed causalities and failed to find proper answers that included the causalities. We improve why-QA based on the following two ideas. First, implicitly expressed causalities in one text might be expressed in other texts with explicit cues. If we can automatically recognize such explicitly expressed causalities from a text archive and use them to complement the implicitly expressed causalities in an answer passage, we can improve why-QA. Second, the causes of similar events tend to be described with a similar set of words (e.g., "seismic energy" and "tectonic plates" for "the Great East Japan Earthquake" and "the 1906 San Francisco Earthquake"). As such, even if we cannot find in a text archive any explicitly expressed cause of an event (e.g., "the Great East Japan Earthquake") expressed in a question (e.g., "Why did the Great East Japan earthquake happen?"), we might be able to identify its implicitly expressed causes with a set of words (e.g., "tectonic plates") that appear in the explicitly expressed cause of a similar event (e.g., "the 1906 San Francisco Earthquake").

We implemented these two ideas in our multi-column convolutional neural networks with a novel attention mechanism, which we call causality attention. Through experiments on Japanese why-QA, we confirmed that our proposed method outperformed the state-of-the-art systems.

References

  1. J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. In Proceedings of ICLR, 2015.Google ScholarGoogle Scholar
  2. D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR, 2015.Google ScholarGoogle Scholar
  3. F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.Google ScholarGoogle Scholar
  4. G. Bouma. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference, pages 31--40, 2009.Google ScholarGoogle Scholar
  5. J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio. Attention-based models for speech recognition. In Proceedings of NIPS, pages 577--585, 2015.Google ScholarGoogle Scholar
  6. D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proceedings of CVPR, pages 3642--3649, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493--2537, Nov. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  8. Q. X. Do, Y. S. Chan, and D. Roth. Minimally supervised event causality identi cation. In Proceedings of EMNLP, pages 294--303, 2011.Google ScholarGoogle Scholar
  9. M. Feng, B. Xiang, M. R. Glass, L. Wang, and B. Zhou. Applying deep learning to answer selection: A study and an open task. In Proceedings of ASRU, pages 813--820, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Girju. Automatic detection of causal relations for question answering. In Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, pages 76--83, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. S. Gordon, C. A. Bejan, and K. Sagae. Commonsense causal reasoning using millions of personal stories. In Proceedings of AAAI, pages 1180--1185, 2011.Google ScholarGoogle Scholar
  12. C. Hashimoto, K. Torisawa, J. Kloetzer, and J. Oh. Generating event causality hypotheses through semantic relations. In Proceedings of AAAI, pages 2396--2403, 2015.Google ScholarGoogle Scholar
  13. C. Hashimoto, K. Torisawa, J. Kloetzer, M. Sano, I. Varga, J. Oh, and Y. Kidawara. Toward future scenario generation: Extracting event causality exploiting semantic relation, context, and association features. In Proceedings of ACL, pages 987--997, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  14. K. M. Hermann, T. Kočisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom. Teaching machines to read and comprehend. In Proceedings of NIPS, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Higashinaka and H. Isozaki. Corpus-based question answering for why-questions. In Proceedings of IJCNLP, pages 418--425, 2008.Google ScholarGoogle Scholar
  16. R. Iida, K. Torisawa, J.-H. Oh, C. Kruengkrai, and J. Kloetzer. Intra-sentential subject zero anaphora resolution using multi-column convolutional neural network. In Proceedings of EMNLP, pages 1244--1254, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Kazama and K. Torisawa. Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In Proceedings of ACL-HLT, pages 407--415, 2008.Google ScholarGoogle Scholar
  18. Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of EMNLP, pages 1746--1751, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  19. Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. Character-aware neural language models. In Proceedings of AAAI, pages 2741--2749, 2016.Google ScholarGoogle Scholar
  20. C. Kruengkrai, K. Torisawa, C. Hashimoto, J. Kloetzer, J.-H. Oh, and M. Tanaka. Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks. In Proceedings of AAAI, 2017.Google ScholarGoogle Scholar
  21. J. La erty, A. McCallum, and F. Pereira. Conditional random elds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML, pages 282--289, 2001.Google ScholarGoogle Scholar
  22. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Mnih, N. Heess, A. Graves, and k. kavukcuoglu. Recurrent models of visual attention. In Proceedings of NIPS, pages 2204--2212, 2014Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Murata, S. Tsukawaki, T. Kanamaru, Q. Ma, and H. Isahara. A system for answering non-factoid Japanese questions by using passage retrieval weighted based on type of answer. In Proceedings of NTCIR-6, 2007.Google ScholarGoogle Scholar
  25. J.-H. Oh, K. Torisawa, C. Hashimoto, R. Iida, M. Tanaka, and J. Kloetzer. A semi-supervised learning approach to why-question answering. In Proceedings of AAAI, pages 3022--3029, 2016.Google ScholarGoogle Scholar
  26. J.-H. Oh, K. Torisawa, C. Hashimoto, T. Kawada, S. D. Saeger, J. Kazama, and Y. Wang. Why question answering using sentiment analysis and word classes. In Proceedings of EMNLP-CoNLL, pages 368--378, 2012.Google ScholarGoogle Scholar
  27. J.-H. Oh, K. Torisawa, C. Hashimoto, M. Sano, S. D. Saeger, and K. Ohtake. Why-question answering using intra- and inter-sentential causal relations. In Proceedings of ACL, pages 1733--1743, 2013.Google ScholarGoogle Scholar
  28. A. M. Rush, S. Chopra, and J. Weston. A neural attention model for abstractive sentence summarization. In Proceedings of EMNLP, pages 379--389, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of SIGIR, pages 373--382, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Sukhbaatar, a. szlam, J. Weston, and R. Fergus. End-to-end memory networks. In Proceedings of NIPS, pages 2440--2448, 2015.Google ScholarGoogle Scholar
  31. M. Tan, B. Xiang, and B. Zhou. Lstm-based deep learning models for non-factoid answer selection. CoRR, abs/1511.04108, 2015.Google ScholarGoogle Scholar
  32. S. Verberne, H. van Halteren, D. Theijssen, S. Raaijmakers, and L. Boves. Learning to rank for why-question answering. Inf. Retr., 14(2):107--132, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Wang and E. Nyberg. A long short-term memory model for answer sentence selection in question answering. In Proceedings of ACL-IJCNLP, pages 707--712, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  34. W. Yin, H. Schütze, B. Xiang, and B. Zhou. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics, 4:259--272, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  35. L. Yu, K. M. Hermann, P. Blunsom, and S. Pulman. Deep learning for answer sentence selection. In NIPS Deep Learning Workshop, 2014.Google ScholarGoogle Scholar
  36. M. D. Zeiler. ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701, 2012.Google ScholarGoogle Scholar
  37. D. Zeng, K. Liu, Y. Chen, and J. Zhao. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of EMNLP, pages 1753--1762, 2015. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
                February 2017
                868 pages
                ISBN:9781450346757
                DOI:10.1145/3018661

                Copyright © 2017 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 2 February 2017

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%

                Upcoming Conference

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader