skip to main content
research-article

Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

Published: 31 January 2018 Publication History

Abstract

This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.

References

[1]
James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proceedings of the Python for Scientific Computing Conference (SciPy).
[2]
Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. A fast and unified model for parsing and sentence understanding. 2016. In Proceedings of the 54th Annual Meetings of the Association for Computational Linguistics. 1466--1477.
[3]
Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. 2015. Joint learning of character and word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 1236--1242.
[4]
Jianpeng Cheng and Dimitri Kartsaklis. 2015. Syntax-aware multi-sense word embeddings for deep compositional models of meaning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1531--1542.
[5]
Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin Thoma, Eric Battenberg, and Jack Kelly et al. 2015. Lasagne: First release. Zenodo: Geneva, Switzerland.
[6]
Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the 12th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 758--764.
[7]
Kazuma Hashimoto and Yoshimasa Tsuruoka. 2016. Adaptive joint learning of compositional and non-compositional phrase embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 205--215.
[8]
Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1367--1377.
[9]
Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873--882.
[10]
Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum´e III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1681--1691.
[11]
Li. Jiwei, and Eduard H. Hovy. 2014. A model of coherence based on distributed sentence representation. In Proceedings of the 2014 Conference On Empirical Methods in Natural Language Processing. 2039--2048.
[12]
Dimitri Kartsaklis. 2015. Compositional distributional semantics with compact closed categories and Frobenius algebras. arXiv preprint arXiv:1505.00138.
[13]
Tom Kenter, Alexey Borisov, and Maarten de Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 941--951.
[14]
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2741--2749.
[15]
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[16]
Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. Advances in Neural Information Processing Systems. 3294--3302.
[17]
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188--1196.
[18]
Jinxia Li. 2011. A quantitative analysis of the transparency of lexical meaning in modern Chinese dictionary. Chinese Linguistics 3, 54--62.
[19]
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.
[20]
Lucy J MacGregor and Yury Shtyrov. 2013. Multiple routes for compound word processing in the brain: Evidence from EEG. Brain and Language 126, 2, 217--229.
[21]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781.
[22]
Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34, 8, 1388--1429.
[23]
Yasumasa Miyamoto and Kyunghyun Cho. 2016. Gated word-character recurrent language model. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1992--1997.
[24]
Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 4, 694--707.
[25]
Marek Rei, Gamal KO Crichton, and Sampo Pyysalo. 2016. Attending to characters in neural sequence labeling models. In Proceedings of the 26th International Conference on Computational Linguistics. 309--318.
[26]
Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector space models of word meaning. In Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 109--117.
[27]
Richar Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning (2011, July). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161.
[28]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1556--1566.
[29]
Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4137--4143.
[30]
Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Exploiting word internal structures for generic Chinese sentence representation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 298--303.
[31]
Shaonan Wang and Chengqing Zong. 2017. Comparison study on critical components in composition model for phrase representation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16, 3, 16.
[32]
John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.
[33]
John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504--1515.
[34]
John Wieting and Kevin Gimpel. 2017. Revisting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.
[35]
Kam-Fai Wong, Wenjie Li, Ruifeng Xu, and Zhengsheng Zhang. 2009. Introduction to Chinese natural language processing. Synthesis Lectures on Human Language Technologies 2, 1, 1--148.
[36]
Jian Xu, Jiawei Liu, Liangang Zhang, Zhengyu Li, and Huanhuan Chen. 2016. Improve Chinese word embeddings by exploiting internal structure. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1041--1050.
[37]
Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics. 4, 259--272.
[38]
Fabio Massimo Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics. 1263--1271.18

Cited By

View all
  • (2022)Language cognition and language computation — human and machine language understandingSCIENTIA SINICA Informationis10.1360/SSI-2021-010052:10(1748)Online publication date: 9-Oct-2022
  • (2022)A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviewsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06228-926:2(853-866)Online publication date: 1-Jan-2022
  • (2020)Dynamically Jointing character and word embedding for Chinese text Classification2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00055(336-343)Online publication date: Aug-2020
  • Show More Cited By

Index Terms

  1. Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
      September 2018
      196 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3184403
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 January 2018
      Accepted: 01 October 2017
      Revised: 01 September 2017
      Received: 01 May 2017
      Published in TALLIP Volume 17, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Sentence representation
      2. composition model
      3. inner-word character
      4. mask gate
      5. max pooling
      6. mixed character-word representation

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Language cognition and language computation — human and machine language understandingSCIENTIA SINICA Informationis10.1360/SSI-2021-010052:10(1748)Online publication date: 9-Oct-2022
      • (2022)A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviewsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06228-926:2(853-866)Online publication date: 1-Jan-2022
      • (2020)Dynamically Jointing character and word embedding for Chinese text Classification2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00055(336-343)Online publication date: Aug-2020
      • (2019)Research and Design of Knowledge System Construction System Based on Natural Language ProcessingInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S0218001419590389Online publication date: 25-Jan-2019

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media