Abstract
The internal structural information of words has proven to be very effective for learning Chinese word embeddings. However, most previous attempts made a single form extraction of internal feature to learn representations, ignoring the comprehensive combination of such information. And they focused only on explicit feature of internal structures, even though these structures still have the implicit semantics of words. In this paper, we propose Radical and Stroke-enhanced Word Embeddings (RSWE), a novel method based on neural networks for learning Chinese word embeddings with joint guidance from semantic and morphological internal information. RSWE enables an embedding model to learn simultaneously from (1) implicit semantic information that is exploited from the radicals, and (2) stroke n-grams information that can be explicitly obtained from Chinese words. In the learning process, RSWE uses stroke n-grams to capture the local structural feature of words, and integrates the implicit information exploited from radicals to enhance the semantic of embeddings. Through this combination procedure, semantics of Chinese words are effectively transferred into the learned embeddings. We evaluate the effectiveness of RSWE on word similarity computation, word analogy reasoning, performance over dimensions, performance over learning corpus size, and named entity recognition tasks, the experimental results show that our model outperforms existing state-of-the-art approaches.
Similar content being viewed by others
References
Bian J, Gao B, Liu TY (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 132–148
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. Int Conf Mach Learn 2014:1899–1907
Cao S, Lu W, Zhou J, Li X (2018) cw2vec: learning chinese word embeddings with stroke n-gram information. In: Thirty-second AAAI conference on artificial intelligence, pp 5053–5061
Chen X, Lei X, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: International conference on artificial intelligence, pp 1236–1242
Chung T, Xu B, Liu Y, Ouyang C, Li S, Luo L (2019) Empirical study on character level neural network classifier for Chinese text. Eng Appl Artif Intell 80:1–7
Cotterell R, Sch\(\ddot{u}\)tze H (2015) Morphological word-embeddings. In: Proceedings of the 2015 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1287–1292
Yu J, Xun J, Hao X, Song Y (2017) Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 286–291
Heinzerling B, Strube M (2018) BPEmb: tokenization-free pre-trained subword embeddings in 275 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), pp 2989–2993
Jin P, Wu Y (2012) Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of the first joint conference on lexical and computational semantics-volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation, association for computational linguistics, pp 374–377
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence, pp 2741–2749
Li Y, Li W, Sun F, Li S (2015) Component-enhanced Chinese character embeddings. arXiv preprint arXiv:1508.06669
Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning. pp 104–113
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909
Wang W, Bao F, Gao G (2019) Learning morpheme representation for mongolian named entity recognition. Neural Process Lett 50:1–18
Su TR, Lee HY (2017) Learning Chinese word representations from glyphs of characters. arXiv preprint arXiv:1708.04755
Sun Y, Lei L, Nan Y, Ji Z, Wang X (2014) Radical-enhanced chinese character embedding. Lect Notes Comput Sci 8835:279–286
Xu J, Liu J, Zhang L, Li Z, Chen H (2016) Improve Chinese word embeddings by exploiting internal structure. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1041–1050
Xu Y, Liu J (2017) Implicitly incorporating morphological information into word embedding. arXiv preprint arXiv:1701.02481
Yang L, Sun M (2015) Improved learning of Chinese word embeddings with semantic knowledge. In: Chinese computational linguistics and natural language processing based on naturally annotated big data, Springer, pp 15–25
Yoshua B, Aaron C, Pascal V (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 2019:1–15
Acknowledgements
This work was supported by the Fundamental Research Funds for the Central Universities (No.2019XD-A20).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclosure
Conflict of Interest: The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, S., Zhou, W. & Zhou, Q. Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks. Neural Process Lett 52, 1109–1121 (2020). https://doi.org/10.1007/s11063-020-10289-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10289-6