skip to main content
10.1145/3132218.3132236acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

Siamese Network with Soft Attention for Semantic Text Understanding

Authors Info & Claims
Published:11 September 2017Publication History

ABSTRACT

We propose a task independent neural networks model, based on a Siamese-twin architecture. Our model specifically benefits from two forms of attention scheme, which we use to extract high level feature representation of the underlying texts, both at word level (intra-attention) as well as sentence level (inter-attention). The inter attention scheme uses one of the text to create a contextual interlock with the other text, thus paying attention to mutually important parts. We evaluate our system on three tasks, i.e. Textual Entailment, Paraphrase Detection and Answer-Sentence selection. We set a near state-of-the-art result on the textual entailment task with the SNLI corpus while obtaining strong performance across the other tasks that we evaluate our model on.

References

  1. Kolawole Adebayo, Luigi Di Caro, Livio Robaldo, and Guido Boella. 2016. Textual Inference with Deep Learning Technique. In Proc. of the 28th Annual Benelux Conference on Artificial Intelligence (BNAIC2016).Google ScholarGoogle Scholar
  2. E. Agirrea, C. Baneab, D. Cerd, M. Diabe, A. Gonzalez-Agirrea, R. Mihalceab, G. Rigaua, J. Wiebef, and B. Donostia. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. Proceedings of SemEval (2016), 497--511.Google ScholarGoogle Scholar
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  4. Petr Baudiš and Jan Šedivy. 2016. Sentence Pair Scoring: Towards Unified Framework for Text Comprehension. arXiv preprint arXiv:1603.06127 (2016).Google ScholarGoogle Scholar
  5. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Dang, and Danilo Giampiccolo. 2011. The seventh pascal recognizing textual entailment challenge. Proceedings of TAC 2011 (2011).Google ScholarGoogle Scholar
  6. Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015).Google ScholarGoogle Scholar
  7. Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Enhancing and combining sequential and tree lstm for natural language inference. arXiv preprint arXiv:1609.06038 (2016).Google ScholarGoogle Scholar
  8. Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733 (2016).Google ScholarGoogle Scholar
  9. Cicero Nogueira dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive pooling networks. CoRR, abs/1602.03609 (2016).Google ScholarGoogle Scholar
  10. Minwei Feng, Bing Xiang, Michael R Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 813--820.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hua He and Jimmy Lin. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of NAACL-HLT. 937--948.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Adebayo Kolawole John, Luigi Di Caro, and Guido Boella. 2016. NORMAS at SemEval-2016 Task 1: SEMSIM: A Multi-Feature Approach to Semantic Text Similarity. Proceedings of SemEval (2016).Google ScholarGoogle Scholar
  14. Chen Liu. 2013. Probabilistic siamese network for learning representations. Ph.D. Dissertation. University of Toronto.Google ScholarGoogle Scholar
  15. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Modelling Interaction of Sentence Pair with coupled-LSTMs. arXiv preprint arXiv:1605.05573 (2016).Google ScholarGoogle Scholar
  16. Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv:1605.09090 (2016).Google ScholarGoogle Scholar
  17. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockygrave;, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. 2. 3.Google ScholarGoogle Scholar
  18. Tsendsuren Munkhdalai and Hong Yu. 2016. Neural Tree Indexers for Text Understanding. arXiv preprint arXiv:1607.04492 (2016).Google ScholarGoogle Scholar
  19. Ankur P Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933 (2016).Google ScholarGoogle Scholar
  20. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14. 1532--43.Google ScholarGoogle Scholar
  21. Jinfeng Rao, Hua He, and Jimmy Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1913--1916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočisky, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015).Google ScholarGoogle Scholar
  23. Lei Sha, Baobao Chang, Zhifang Sui, and Sujian Li. {n. d.}. Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition. ({n. d.}).Google ScholarGoogle Scholar
  24. R. Socher, E. Huang, J. Pennin, C. Manning, and A. Ng. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems. 801--809. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Richard Socher, Cliff C Lin, Chris Manning, and Andrew Y Ng. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11). 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104-- 3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ming Tan, Bing Xiang, and Bowen Zhou. 2015. LSTM-based Deep Learning Models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015).Google ScholarGoogle Scholar
  29. Bingning Wang, Kang Liu, and Jun Zhao. 2016. Inner attention based recurrent neural networks for answer selection. In The Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  30. Shuohang Wang and Jing Jiang. 2016. A Compare-Aggregate Model for Matching Text Sequences. arXiv preprint arXiv:1611.01747 (2016).Google ScholarGoogle Scholar
  31. Shuohang Wang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905 (2016).Google ScholarGoogle Scholar
  32. Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral Multi-Perspective Matching for Natural Language Sentences. arXiv preprint arXiv:1702.03814 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. arXiv preprint arXiv:1602.07019 (2016).Google ScholarGoogle Scholar
  34. Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M Rush, Bart van Merriënboer, Armand Joulin, and Tomas Mikolov. 2015. Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698 (2015).Google ScholarGoogle Scholar
  35. Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916 (2014).Google ScholarGoogle Scholar
  36. Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In EMNLP. Citeseer, 2013--2018.Google ScholarGoogle ScholarCross RefCross Ref
  37. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of NAACL-HLT. 1480--1489.Google ScholarGoogle ScholarCross RefCross Ref
  38. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2015. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. arXiv preprint arXiv:1512.05193 (2015).Google ScholarGoogle Scholar
  39. Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710 (2015).Google ScholarGoogle Scholar
  40. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. 649--657. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Siamese Network with Soft Attention for Semantic Text Understanding

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      Semantics2017: Proceedings of the 13th International Conference on Semantic Systems
      September 2017
      202 pages
      ISBN:9781450352963
      DOI:10.1145/3132218

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 September 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate40of182submissions,22%
    • Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader