research-article

Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

Authors:

Chengqing ZongAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 17, Issue 3

Article No.: 14, Pages 1 - 18

https://doi.org/10.1145/3156778

Published: 31 January 2018 Publication History

Abstract

This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.

References

[1]

James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proceedings of the Python for Scientific Computing Conference (SciPy).

[2]

Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. A fast and unified model for parsing and sentence understanding. 2016. In Proceedings of the 54th Annual Meetings of the Association for Computational Linguistics. 1466--1477.

[3]

Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. 2015. Joint learning of character and word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 1236--1242.

Digital Library

[4]

Jianpeng Cheng and Dimitri Kartsaklis. 2015. Syntax-aware multi-sense word embeddings for deep compositional models of meaning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1531--1542.

[5]

Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin Thoma, Eric Battenberg, and Jack Kelly et al. 2015. Lasagne: First release. Zenodo: Geneva, Switzerland.

[6]

Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the 12th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 758--764.

[7]

Kazuma Hashimoto and Yoshimasa Tsuruoka. 2016. Adaptive joint learning of compositional and non-compositional phrase embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 205--215.

[8]

Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1367--1377.

[9]

Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873--882.

Digital Library

[10]

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum´e III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1681--1691.

[11]

Li. Jiwei, and Eduard H. Hovy. 2014. A model of coherence based on distributed sentence representation. In Proceedings of the 2014 Conference On Empirical Methods in Natural Language Processing. 2039--2048.

[12]

Dimitri Kartsaklis. 2015. Compositional distributional semantics with compact closed categories and Frobenius algebras. arXiv preprint arXiv:1505.00138.

[13]

Tom Kenter, Alexey Borisov, and Maarten de Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 941--951.

[14]

Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2741--2749.

Digital Library

[15]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[16]

Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. Advances in Neural Information Processing Systems. 3294--3302.

Digital Library

[17]

Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188--1196.

Digital Library

[18]

Jinxia Li. 2011. A quantitative analysis of the transparency of lexical meaning in modern Chinese dictionary. Chinese Linguistics 3, 54--62.

[19]

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.

[20]

Lucy J MacGregor and Yury Shtyrov. 2013. Multiple routes for compound word processing in the brain: Evidence from EEG. Brain and Language 126, 2, 217--229.

[21]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781.

[22]

Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34, 8, 1388--1429.

[23]

Yasumasa Miyamoto and Kyunghyun Cho. 2016. Gated word-character recurrent language model. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1992--1997.

[24]

Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 4, 694--707.

Digital Library

[25]

Marek Rei, Gamal KO Crichton, and Sampo Pyysalo. 2016. Attending to characters in neural sequence labeling models. In Proceedings of the 26th International Conference on Computational Linguistics. 309--318.

[26]

Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector space models of word meaning. In Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 109--117.

Digital Library

[27]

Richar Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning (2011, July). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161.

Digital Library

[28]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1556--1566.

[29]

Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4137--4143.

Digital Library

[30]

Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Exploiting word internal structures for generic Chinese sentence representation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 298--303.

[31]

Shaonan Wang and Chengqing Zong. 2017. Comparison study on critical components in composition model for phrase representation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16, 3, 16.

Digital Library

[32]

John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.

[33]

John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504--1515.

[34]

John Wieting and Kevin Gimpel. 2017. Revisting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.

[35]

Kam-Fai Wong, Wenjie Li, Ruifeng Xu, and Zhengsheng Zhang. 2009. Introduction to Chinese natural language processing. Synthesis Lectures on Human Language Technologies 2, 1, 1--148.

Digital Library

[36]

Jian Xu, Jiawei Liu, Liangang Zhang, Zhengyu Li, and Huanhuan Chen. 2016. Improve Chinese word embeddings by exploiting internal structure. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1041--1050.

[37]

Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics. 4, 259--272.

[38]

Fabio Massimo Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics. 1263--1271.18

Digital Library

Cited By

王少丁鼐林楠张家宗成(2022)Language cognition and language computation — human and machine language understandingSCIENTIA SINICA Informationis10.1360/SSI-2021-010052:10(1748)Online publication date: 9-Oct-2022
https://doi.org/10.1360/SSI-2021-0100
Zhang SXu HZhu GChen XLi K(2022)A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviewsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06228-926:2(853-866)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s00500-021-06228-9
Tang XHu XLi P(2020)Dynamically Jointing character and word embedding for Chinese text Classification2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00055(336-343)Online publication date: Aug-2020
https://doi.org/10.1109/ICBK50248.2020.00055
Show More Cited By

Index Terms

Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
      2. Lexical semantics

Recommendations

Comparison Study on Critical Components in Composition Model for Phrase Representation

Phrase representation, an important step in many NLP tasks, involves representing phrases as continuous-valued vectors. This article presents detailed comparisons concerning the effects of word vectors, training data, and the composition and objective ...
Unsupervised Cross-Lingual Sentence Representation Learning via Linguistic Isomorphism
Knowledge Science, Engineering and Management
Abstract
Recently, many researches on learning cross-lingual word embeddings without parallel data have achieved success by utilizing word isomorphism among languages. However, unsupervised cross-lingual sentence representation, which aims to learn a ...
Unsupervised sentence representations as word information series: Revisiting TF–IDF
Highlights
- An unsupervised sentence representation (embedding) method is proposed.
- Our ...
Abstract
Sentence representation at the semantic level is a challenging task for natural language processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17, Issue 3

September 2018

196 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3184403

Editor:
Nianwen Xue
Brandeis University, Waltham, USA

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2018

Accepted: 01 October 2017

Revised: 01 September 2017

Received: 01 May 2017

Published in TALLIP Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
320
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

王少丁鼐林楠张家宗成(2022)Language cognition and language computation — human and machine language understandingSCIENTIA SINICA Informationis10.1360/SSI-2021-010052:10(1748)Online publication date: 9-Oct-2022
https://doi.org/10.1360/SSI-2021-0100
Zhang SXu HZhu GChen XLi K(2022)A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviewsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06228-926:2(853-866)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s00500-021-06228-9
Tang XHu XLi P(2020)Dynamically Jointing character and word embedding for Chinese text Classification2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00055(336-343)Online publication date: Aug-2020
https://doi.org/10.1109/ICBK50248.2020.00055
Chen KZu YRen W(2019)Research and Design of Knowledge System Construction System Based on Natural Language ProcessingInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S0218001419590389Online publication date: 25-Jan-2019
https://doi.org/10.1142/S0218001419590389

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents