Skip to main content
Log in

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The internal structural information of words has proven to be very effective for learning Chinese word embeddings. However, most previous attempts made a single form extraction of internal feature to learn representations, ignoring the comprehensive combination of such information. And they focused only on explicit feature of internal structures, even though these structures still have the implicit semantics of words. In this paper, we propose Radical and Stroke-enhanced Word Embeddings (RSWE), a novel method based on neural networks for learning Chinese word embeddings with joint guidance from semantic and morphological internal information. RSWE enables an embedding model to learn simultaneously from (1) implicit semantic information that is exploited from the radicals, and (2) stroke n-grams information that can be explicitly obtained from Chinese words. In the learning process, RSWE uses stroke n-grams to capture the local structural feature of words, and integrates the implicit information exploited from radicals to enhance the semantic of embeddings. Through this combination procedure, semantics of Chinese words are effectively transferred into the learned embeddings. We evaluate the effectiveness of RSWE on word similarity computation, word analogy reasoning, performance over dimensions, performance over learning corpus size, and named entity recognition tasks, the experimental results show that our model outperforms existing state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://dumps.wikimedia.org/zhwiki/20161120/.

  2. https://github.com/BYVoid/OpenCC.

  3. https://pypi.org/project/jieba/.

  4. http://www.zdic.net.

References

  1. Bian J, Gao B, Liu TY (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 132–148

  2. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  3. Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. Int Conf Mach Learn 2014:1899–1907

    Google Scholar 

  4. Cao S, Lu W, Zhou J, Li X (2018) cw2vec: learning chinese word embeddings with stroke n-gram information. In: Thirty-second AAAI conference on artificial intelligence, pp 5053–5061

  5. Chen X, Lei X, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: International conference on artificial intelligence, pp 1236–1242

  6. Chung T, Xu B, Liu Y, Ouyang C, Li S, Luo L (2019) Empirical study on character level neural network classifier for Chinese text. Eng Appl Artif Intell 80:1–7

    Article  Google Scholar 

  7. Cotterell R, Sch\(\ddot{u}\)tze H (2015) Morphological word-embeddings. In: Proceedings of the 2015 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1287–1292

  8. Yu J, Xun J, Hao X, Song Y (2017) Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 286–291

  9. Heinzerling B, Strube M (2018) BPEmb: tokenization-free pre-trained subword embeddings in 275 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), pp 2989–2993

  10. Jin P, Wu Y (2012) Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of the first joint conference on lexical and computational semantics-volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation, association for computational linguistics, pp 374–377

  11. Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence, pp 2741–2749

  12. Li Y, Li W, Sun F, Li S (2015) Component-enhanced Chinese character embeddings. arXiv preprint arXiv:1508.06669

  13. Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning. pp 104–113

  14. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354

  15. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781

  16. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  17. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  18. Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909

  19. Wang W, Bao F, Gao G (2019) Learning morpheme representation for mongolian named entity recognition. Neural Process Lett 50:1–18

    Article  Google Scholar 

  20. Su TR, Lee HY (2017) Learning Chinese word representations from glyphs of characters. arXiv preprint arXiv:1708.04755

  21. Sun Y, Lei L, Nan Y, Ji Z, Wang X (2014) Radical-enhanced chinese character embedding. Lect Notes Comput Sci 8835:279–286

    Article  Google Scholar 

  22. Xu J, Liu J, Zhang L, Li Z, Chen H (2016) Improve Chinese word embeddings by exploiting internal structure. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1041–1050

  23. Xu Y, Liu J (2017) Implicitly incorporating morphological information into word embedding. arXiv preprint arXiv:1701.02481

  24. Yang L, Sun M (2015) Improved learning of Chinese word embeddings with semantic knowledge. In: Chinese computational linguistics and natural language processing based on naturally annotated big data, Springer, pp 15–25

  25. Yoshua B, Aaron C, Pascal V (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  26. Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 2019:1–15

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities (No.2019XD-A20).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenan Zhou.

Ethics declarations

Disclosure

Conflict of Interest: The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Zhou, W. & Zhou, Q. Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks. Neural Process Lett 52, 1109–1121 (2020). https://doi.org/10.1007/s11063-020-10289-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10289-6

Keywords

Navigation