skip to main content
research-article

Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

Authors Info & Claims
Published:31 January 2018Publication History
Skip Abstract Section

Abstract

This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.

References

  1. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proceedings of the Python for Scientific Computing Conference (SciPy).Google ScholarGoogle ScholarCross RefCross Ref
  2. Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. A fast and unified model for parsing and sentence understanding. 2016. In Proceedings of the 54th Annual Meetings of the Association for Computational Linguistics. 1466--1477.Google ScholarGoogle Scholar
  3. Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. 2015. Joint learning of character and word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 1236--1242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jianpeng Cheng and Dimitri Kartsaklis. 2015. Syntax-aware multi-sense word embeddings for deep compositional models of meaning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1531--1542.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin Thoma, Eric Battenberg, and Jack Kelly et al. 2015. Lasagne: First release. Zenodo: Geneva, Switzerland.Google ScholarGoogle Scholar
  6. Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the 12th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 758--764.Google ScholarGoogle Scholar
  7. Kazuma Hashimoto and Yoshimasa Tsuruoka. 2016. Adaptive joint learning of compositional and non-compositional phrase embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 205--215.Google ScholarGoogle ScholarCross RefCross Ref
  8. Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1367--1377.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873--882. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum´e III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1681--1691.Google ScholarGoogle ScholarCross RefCross Ref
  11. Li. Jiwei, and Eduard H. Hovy. 2014. A model of coherence based on distributed sentence representation. In Proceedings of the 2014 Conference On Empirical Methods in Natural Language Processing. 2039--2048.Google ScholarGoogle Scholar
  12. Dimitri Kartsaklis. 2015. Compositional distributional semantics with compact closed categories and Frobenius algebras. arXiv preprint arXiv:1505.00138.Google ScholarGoogle Scholar
  13. Tom Kenter, Alexey Borisov, and Maarten de Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 941--951.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2741--2749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google ScholarGoogle Scholar
  16. Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. Advances in Neural Information Processing Systems. 3294--3302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jinxia Li. 2011. A quantitative analysis of the transparency of lexical meaning in modern Chinese dictionary. Chinese Linguistics 3, 54--62.Google ScholarGoogle Scholar
  19. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.Google ScholarGoogle Scholar
  20. Lucy J MacGregor and Yury Shtyrov. 2013. Multiple routes for compound word processing in the brain: Evidence from EEG. Brain and Language 126, 2, 217--229.Google ScholarGoogle ScholarCross RefCross Ref
  21. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781.Google ScholarGoogle Scholar
  22. Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34, 8, 1388--1429.Google ScholarGoogle ScholarCross RefCross Ref
  23. Yasumasa Miyamoto and Kyunghyun Cho. 2016. Gated word-character recurrent language model. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1992--1997.Google ScholarGoogle ScholarCross RefCross Ref
  24. Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 4, 694--707. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marek Rei, Gamal KO Crichton, and Sampo Pyysalo. 2016. Attending to characters in neural sequence labeling models. In Proceedings of the 26th International Conference on Computational Linguistics. 309--318.Google ScholarGoogle Scholar
  26. Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector space models of word meaning. In Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Richar Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning (2011, July). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1556--1566.Google ScholarGoogle Scholar
  29. Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4137--4143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Exploiting word internal structures for generic Chinese sentence representation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 298--303.Google ScholarGoogle ScholarCross RefCross Ref
  31. Shaonan Wang and Chengqing Zong. 2017. Comparison study on critical components in composition model for phrase representation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16, 3, 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.Google ScholarGoogle Scholar
  33. John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504--1515.Google ScholarGoogle ScholarCross RefCross Ref
  34. John Wieting and Kevin Gimpel. 2017. Revisting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  35. Kam-Fai Wong, Wenjie Li, Ruifeng Xu, and Zhengsheng Zhang. 2009. Introduction to Chinese natural language processing. Synthesis Lectures on Human Language Technologies 2, 1, 1--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jian Xu, Jiawei Liu, Liangang Zhang, Zhengyu Li, and Huanhuan Chen. 2016. Improve Chinese word embeddings by exploiting internal structure. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1041--1050.Google ScholarGoogle ScholarCross RefCross Ref
  37. Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics. 4, 259--272.Google ScholarGoogle ScholarCross RefCross Ref
  38. Fabio Massimo Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics. 1263--1271.18 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
        September 2018
        196 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3184403
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 January 2018
        • Accepted: 1 October 2017
        • Revised: 1 September 2017
        • Received: 1 May 2017
        Published in tallip Volume 17, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader