Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Wang, Shirui; Zhou, Wenan; Zhou, Qiang

doi:10.1007/s11063-020-10289-6

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Published: 04 July 2020

Volume 52, pages 1109–1121, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Shirui Wang¹,
Wenan Zhou¹ &
Qiang Zhou¹

462 Accesses
5 Citations
Explore all metrics

Abstract

The internal structural information of words has proven to be very effective for learning Chinese word embeddings. However, most previous attempts made a single form extraction of internal feature to learn representations, ignoring the comprehensive combination of such information. And they focused only on explicit feature of internal structures, even though these structures still have the implicit semantics of words. In this paper, we propose Radical and Stroke-enhanced Word Embeddings (RSWE), a novel method based on neural networks for learning Chinese word embeddings with joint guidance from semantic and morphological internal information. RSWE enables an embedding model to learn simultaneously from (1) implicit semantic information that is exploited from the radicals, and (2) stroke n-grams information that can be explicitly obtained from Chinese words. In the learning process, RSWE uses stroke n-grams to capture the local structural feature of words, and integrates the implicit information exploited from radicals to enhance the semantic of embeddings. Through this combination procedure, semantics of Chinese words are effectively transferred into the learned embeddings. We evaluate the effectiveness of RSWE on word similarity computation, word analogy reasoning, performance over dimensions, performance over learning corpus size, and named entity recognition tasks, the experimental results show that our model outperforms existing state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Learning Chinese word embeddings from semantic and phonetic components

Article 10 August 2022

Joining External Context Characters to Improve Chinese Word Embedding

Chinese Word Embedding Learning with Limited Data

Notes

References

Bian J, Gao B, Liu TY (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 132–148
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. Int Conf Mach Learn 2014:1899–1907
Google Scholar
Cao S, Lu W, Zhou J, Li X (2018) cw2vec: learning chinese word embeddings with stroke n-gram information. In: Thirty-second AAAI conference on artificial intelligence, pp 5053–5061
Chen X, Lei X, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: International conference on artificial intelligence, pp 1236–1242
Chung T, Xu B, Liu Y, Ouyang C, Li S, Luo L (2019) Empirical study on character level neural network classifier for Chinese text. Eng Appl Artif Intell 80:1–7
Article Google Scholar
Cotterell R, Sch\(\ddot{u}\)tze H (2015) Morphological word-embeddings. In: Proceedings of the 2015 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1287–1292
Yu J, Xun J, Hao X, Song Y (2017) Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 286–291
Heinzerling B, Strube M (2018) BPEmb: tokenization-free pre-trained subword embeddings in 275 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), pp 2989–2993
Jin P, Wu Y (2012) Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of the first joint conference on lexical and computational semantics-volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation, association for computational linguistics, pp 374–377
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence, pp 2741–2749
Li Y, Li W, Sun F, Li S (2015) Component-enhanced Chinese character embeddings. arXiv preprint arXiv:1508.06669
Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning. pp 104–113
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909
Wang W, Bao F, Gao G (2019) Learning morpheme representation for mongolian named entity recognition. Neural Process Lett 50:1–18
Article Google Scholar
Su TR, Lee HY (2017) Learning Chinese word representations from glyphs of characters. arXiv preprint arXiv:1708.04755
Sun Y, Lei L, Nan Y, Ji Z, Wang X (2014) Radical-enhanced chinese character embedding. Lect Notes Comput Sci 8835:279–286
Article Google Scholar
Xu J, Liu J, Zhang L, Li Z, Chen H (2016) Improve Chinese word embeddings by exploiting internal structure. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1041–1050
Xu Y, Liu J (2017) Implicitly incorporating morphological information into word embedding. arXiv preprint arXiv:1701.02481
Yang L, Sun M (2015) Improved learning of Chinese word embeddings with semantic knowledge. In: Chinese computational linguistics and natural language processing based on naturally annotated big data, Springer, pp 15–25
Yoshua B, Aaron C, Pascal V (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 2019:1–15
Google Scholar

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities (No.2019XD-A20).

Author information

Authors and Affiliations

Department of Computer Science, Beijing University of Posts and Telecommunications, No.10 Xitucheng Road, Haidian District, 100876, Beijing, China
Shirui Wang, Wenan Zhou & Qiang Zhou

Authors

Shirui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenan Zhou.

Ethics declarations

Disclosure

Conflict of Interest: The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Zhou, W. & Zhou, Q. Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks. Neural Process Lett 52, 1109–1121 (2020). https://doi.org/10.1007/s11063-020-10289-6

Download citation

Published: 04 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11063-020-10289-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Abstract

Access this article

Similar content being viewed by others

Learning Chinese word embeddings from semantic and phonetic components

Joining External Context Characters to Improve Chinese Word Embedding

Chinese Word Embedding Learning with Limited Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclosure

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Abstract

Access this article

Similar content being viewed by others

Learning Chinese word embeddings from semantic and phonetic components

Joining External Context Characters to Improve Chinese Word Embedding

Chinese Word Embedding Learning with Limited Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclosure

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation