Abstract
This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.
- James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proceedings of the Python for Scientific Computing Conference (SciPy).Google ScholarCross Ref
- Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. A fast and unified model for parsing and sentence understanding. 2016. In Proceedings of the 54th Annual Meetings of the Association for Computational Linguistics. 1466--1477.Google Scholar
- Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. 2015. Joint learning of character and word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 1236--1242. Google ScholarDigital Library
- Jianpeng Cheng and Dimitri Kartsaklis. 2015. Syntax-aware multi-sense word embeddings for deep compositional models of meaning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1531--1542.Google ScholarCross Ref
- Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin Thoma, Eric Battenberg, and Jack Kelly et al. 2015. Lasagne: First release. Zenodo: Geneva, Switzerland.Google Scholar
- Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the 12th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 758--764.Google Scholar
- Kazuma Hashimoto and Yoshimasa Tsuruoka. 2016. Adaptive joint learning of compositional and non-compositional phrase embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 205--215.Google ScholarCross Ref
- Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1367--1377.Google ScholarCross Ref
- Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873--882. Google ScholarDigital Library
- Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum´e III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1681--1691.Google ScholarCross Ref
- Li. Jiwei, and Eduard H. Hovy. 2014. A model of coherence based on distributed sentence representation. In Proceedings of the 2014 Conference On Empirical Methods in Natural Language Processing. 2039--2048.Google Scholar
- Dimitri Kartsaklis. 2015. Compositional distributional semantics with compact closed categories and Frobenius algebras. arXiv preprint arXiv:1505.00138.Google Scholar
- Tom Kenter, Alexey Borisov, and Maarten de Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 941--951.Google ScholarCross Ref
- Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2741--2749. Google ScholarDigital Library
- Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
- Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. Advances in Neural Information Processing Systems. 3294--3302. Google ScholarDigital Library
- Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188--1196. Google ScholarDigital Library
- Jinxia Li. 2011. A quantitative analysis of the transparency of lexical meaning in modern Chinese dictionary. Chinese Linguistics 3, 54--62.Google Scholar
- Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.Google Scholar
- Lucy J MacGregor and Yury Shtyrov. 2013. Multiple routes for compound word processing in the brain: Evidence from EEG. Brain and Language 126, 2, 217--229.Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781.Google Scholar
- Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34, 8, 1388--1429.Google ScholarCross Ref
- Yasumasa Miyamoto and Kyunghyun Cho. 2016. Gated word-character recurrent language model. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1992--1997.Google ScholarCross Ref
- Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 4, 694--707. Google ScholarDigital Library
- Marek Rei, Gamal KO Crichton, and Sampo Pyysalo. 2016. Attending to characters in neural sequence labeling models. In Proceedings of the 26th International Conference on Computational Linguistics. 309--318.Google Scholar
- Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector space models of word meaning. In Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 109--117. Google ScholarDigital Library
- Richar Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning (2011, July). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161. Google ScholarDigital Library
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1556--1566.Google Scholar
- Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4137--4143. Google ScholarDigital Library
- Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Exploiting word internal structures for generic Chinese sentence representation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 298--303.Google ScholarCross Ref
- Shaonan Wang and Chengqing Zong. 2017. Comparison study on critical components in composition model for phrase representation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16, 3, 16. Google ScholarDigital Library
- John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.Google Scholar
- John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504--1515.Google ScholarCross Ref
- John Wieting and Kevin Gimpel. 2017. Revisting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Google Scholar
- Kam-Fai Wong, Wenjie Li, Ruifeng Xu, and Zhengsheng Zhang. 2009. Introduction to Chinese natural language processing. Synthesis Lectures on Human Language Technologies 2, 1, 1--148. Google ScholarDigital Library
- Jian Xu, Jiawei Liu, Liangang Zhang, Zhengyu Li, and Huanhuan Chen. 2016. Improve Chinese word embeddings by exploiting internal structure. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1041--1050.Google ScholarCross Ref
- Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics. 4, 259--272.Google ScholarCross Ref
- Fabio Massimo Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics. 1263--1271.18 Google ScholarDigital Library
Index Terms
- Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
Recommendations
Comparison Study on Critical Components in Composition Model for Phrase Representation
Phrase representation, an important step in many NLP tasks, involves representing phrases as continuous-valued vectors. This article presents detailed comparisons concerning the effects of word vectors, training data, and the composition and objective ...
Unsupervised Cross-Lingual Sentence Representation Learning via Linguistic Isomorphism
Knowledge Science, Engineering and ManagementAbstractRecently, many researches on learning cross-lingual word embeddings without parallel data have achieved success by utilizing word isomorphism among languages. However, unsupervised cross-lingual sentence representation, which aims to learn a ...
Unsupervised sentence representations as word information series: Revisiting TF–IDF
Highlights- An unsupervised sentence representation (embedding) method is proposed.
- Our ...
AbstractSentence representation at the semantic level is a challenging task for natural language processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is ...
Comments