Joining External Context Characters to Improve Chinese Word Embedding

Zhang, Xianchao; Liu, Shike; Li, Yuangang; Liang, Wenxin

doi:10.1007/978-3-319-59081-3_48

Joining External Context Characters to Improve Chinese Word Embedding

Xianchao Zhang¹⁶,
Shike Liu¹⁶,
Yuangang Li^17,18 &
…
Wenxin Liang¹⁶

Conference paper
First Online: 31 May 2017

2813 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10262))

Abstract

In Chinese, a word is usually composed of several characters, the semantic meaning of a word is related to its composing characters and contexts. Previous studies have shown that modeling the characters can benefit learning word embeddings, however, they ignore the external context characters. In this paper, we propose a novel Chinese word embeddings model which considers both internal characters and external context characters. In this way, isolated characters have more relevance and character embeddings contain more semantic information. Therefore, the effectiveness of Chinese word embeddings is improved. Experimental results show that our model outperforms other word embeddings methods on word relatedness computation, analogical reasoning and text classification tasks, and our model is empirically robust to the proportion of character modeling and corpora size.

This work was supported by NSFC (No. 61632019) and 863 project of China (No. 2015AA015403).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (1), pp. 238–247 (2014)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: ICML, pp. 1899–1907 (2014)
Google Scholar
Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. Association for Computational Linguistics (2015)
Google Scholar
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: EMNLP, pp. 1025–1035. Citeseer (2014)
Google Scholar
Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.B.: Joint learning of character and word embeddings. In: IJCAI, pp. 1236–1242 (2015)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Cotterell, R., Schütze, H., Eisner, J.: Morphological smoothing and extrapolation of word embeddings. In: Meeting of the Association for Computational Linguistics, pp. 1651–1660 (2016)
Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Science (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: ACL (1), pp. 455–465 (2013)
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Inside out: two jointly predictive models for word representations and phrase representations. In: AAAI, pp. 2821–2827 (2016)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)
Google Scholar
Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: AAAI, pp. 2195–2202 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Technology, Dalian University of Technology, Dalian, 116024, China
Xianchao Zhang, Shike Liu & Wenxin Liang
Shanghai University of Finance and Economics, Shanghai, 200433, China
Yuangang Li
Goldpac Limited, Zhuhai, 519070, China
Yuangang Li

Authors

Xianchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shike Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuangang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Liang .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Fengyu Cong
City University of Hong Kong, Kowloon Tong, Hong Kong
Andrew Leung
Chinese Academy of Sciences, Beijing, China
Qinglai Wei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Liu, S., Li, Y., Liang, W. (2017). Joining External Context Characters to Improve Chinese Word Embedding. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10262. Springer, Cham. https://doi.org/10.1007/978-3-319-59081-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-59081-3_48
Published: 31 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59080-6
Online ISBN: 978-3-319-59081-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics