Abstract
Chinese word embedding has attracted considerable attention in the field of natural language processing. Existing methods model the relation between target and neighbouring contextual words. However, with the phenomenon of irrelevant neighbouring words in Chinese, these methods are limited in capturing and understanding the semantics of Chinese words. In this study, we designed sc2vec to explore Chinese word embeddings by proposing a similar context to reduce the influence of the above problem and comprehend relevant semantics of Chinese words. Meanwhile, to enhance the learning architecture, sc2vec was modelled with reinforcement learning to generate high-quality Chinese word embeddings, regarding continuous bag-of-words and skip-gram models as two actions of an agent over a corpus. The results on word analogy, word similarity, named entity recognition, and text classification tasks demonstrate that the proposed model outperforms most state-of-the-art approaches.
Similar content being viewed by others
Data availability
The text data used to support the findings of this study are in http://www.sogou.com/labs/resource/ca.php
Code availability
The code is written in Python with PyTorch. The code used in this study is planned for release after the paper is accepted.
Notes
\(\begin{aligned} E(b\nabla \log p_{\theta } (\tau )) & = \sum\limits_{\tau } {p_{\theta } (\tau )} \nabla \log p_{\theta } (\tau )b \\ & = \sum\limits_{\tau } {p_{\theta } (\tau )\frac{{\nabla p_{\theta } (\tau )b}}{{p_{\theta } (\tau )}}} \\ & = \sum\limits_{\tau } {\nabla p_{\theta } (\tau )b} \\ & = \nabla (\sum\limits_{\tau } {p_{\theta } (\tau ))b} = \nabla _{\theta } b = 0. \\ \end{aligned}\)
By setting \(F=(R\left( \tau ^n\right) -b)\nabla \mathrm {log}p_\theta \left( \tau \right)\), its variance is \(Var\left( F\right) ={E(F-E(F))}^2={E(F}^2)-E({E(F)}^2)\). We want to obtain the minimum of the variance; thus, \(\frac{\partial Var(F)}{\partial b}=0\). As \(E({E(F)}^2)\) is unrelated to b, \(\frac{\partial Var(F)}{\partial b}=E\left( F\frac{\partial F}{\partial b}\right) =0\), and we can obtain b as Eq. (11).
References
Xiong ZY, Qin K, Yang HB, Luo GC (2020) Learning Chinese word representation better by cascade morphological n-gram. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05198-7
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537. https://doi.org/10.5555/1953048.2078186
Senel LK, Utlu I, Yucesoy V, Koc A, Cukur T (2018) Semantic structure and interpretability of word embeddings. IEEE ACM Trans Audio Speech Lang Process 26(10):1769–1779. https://doi.org/10.1109/TASLP.2018.2837384
Lu GQ, Gan JZ, Yin J, Luo ZP, Li B, Zhao XS (2020) Multi-task learning using a hybrid representation for text classification. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3934-y
Wang GY, Li CY, Wang WL, Zhang YZ, Shen DH, Zhang XY, Henao R, Carin L (2018) Joint embedding of words and labels for text classification. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 2321–2331. https://doi.org/10.18653/v1/P18-1216
Gaur B, Saluja GS, Sivakumar HB, Singh S (2020) Semi-supervised deep learning based named entity recognition model to parse education section of resumes. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05351-2
Yu LC, Jin W, Lai KR, Zhang XJ (2018) Refining word embeddings using intensity scores for sentiment analysis. IEEE ACM Trans Audio Speech Lang Process 26(3):671–681. https://doi.org/10.1109/TASLP.2017.2788182
Lei M, Huang HY, Feng C (2020) Multi-granularity semantic representation model for relation extraction. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05464-8
Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of conference on neural information processing systems (NIPS), pp 3111–3119. https://doi.org/10.5555/2999792.2999959
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 1532–1543. https://doi.org/10.3115/v1/d14-1162
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5(1):135–146. https://doi.org/10.1162/tacl_a_00051
Chen XX, Lei X, Liu ZY, Sun MS, Luan HB (2015) Joint learning of character and word embeddings. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1236–1242. https://doi.org/10.5555/2832415.2832421
Cao SS, Lu W, Zhou J, Li XL (2018) Cw2vec: Learning Chinese word embeddings with stroke n-gram information. In: Proceedings of AAAI conference on artificial intelligence (AAAI), pp 5053–5061
Harris ZS (1954) Distributional structure. Papers in structural and transformational linguistics
Hindle D (1990) Noun classification from predicate-argument structures. In: Proceedings of Annual meeting of the association for computational linguistics (ACL), pp 268–275. https://doi.org/10.3115/981823.981857
Vashishth S, Bhandari M, Yadav P, Rai P, Bhattacharyya C, Talukdar P (2019) Incorporating syntactic and semantic information in word embeddings using graph convolutional networks. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 3308–3318. https://doi.org/10.18653/v1/p19-1320
Yu JX, Jian X, Xin H, Song YQ (2017) Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 286–291. https://doi.org/10.18653/v1/d17-1027
Su TR, Lee HY (2017) Learning Chinese word representations from glyphs of characters. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 264–273. https://doi.org/10.18653/v1/d17-1025
Zhang Y, Liu Y, Zhu J, Wu X (2021) FSPRM: a feature subsequence based probability representation model for Chinese word embedding. IEEE ACM Trans Audio Speech Lang Process 29:1702–1716. https://doi.org/10.1109/TASLP.2021.3073868
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of conference of the north American chapter of the association for computational linguistics (NAACL), pp 4171–4186. https://doi.org/10.18653/v1/n19-1423
Basta C, Costa-jussa MR, Casas N (2020) Extensive study on the underlying gender bias in contextualized word embeddings. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05211-z
Song Y, Shi SM, Li J (2018) Joint learning embeddings for Chinese words and their components via ladder structured networks. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 4375–4381. https://doi.org/10.24963/ijcai.2018/608
Chen YC, Bansal M (2018) Fast abstractive summarization with reinforce-selected sentence rewriting. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 675–686. https://doi.org/10.18653/v1/P18-1063
Song Y, Shi SM (2018) Complementary learning of word embeddings. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 4368–4374. https://doi.org/10.24963/ijcai.2018/607
Yang L, Chen XX, Liu ZY, Sun MS (2017) Improving word representations with document labels. IEEE ACM Trans Audio Speech Lang Process 25(4):863–870. https://doi.org/10.1109/TASLP.2017.2658019
Camacho-Collados J, Espinosa-Anke L, Jameel S, Schockaert S (2019) A latent variable model for learning distributional relation vectors. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 4911–4917. https://doi.org/10.24963/ijcai.2019/682
Camacho-Collados J, Espinosa Anke L, Schockaert S (2019) Relational word embeddings. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 3286–3296. https://doi.org/10.18653/v1/P19-1318
Sun X, Gao Y, Sutcliffe R, Guo SX, Wang X, Feng J (2021) Word representation learning based on bidirectional GRUs with drop loss for sentiment classification. IEEE Trans Syst Man Cybern Syst 51(7):4532–4542. https://doi.org/10.1109/TSMC.2019.2940097
Meng YX, Wu W, Wang F, Li XY, Nie P, Yin F, Li MY, Han QH, Sun XF, Li JW (2019) Glyce: Glyph-vectors for Chinese character representations. In: Proceedings of advances in neural information processing systems (NIPS), pp 2742–2753
Ma B, Qi Q, Liao JX, Sun HF, Wang JY (2020) Learning Chinese word embeddings from character structural information. Comput Speech Lang 60:101031. https://doi.org/10.1016/j.csl.2019.101031
Zhang Y, Liu YG, Zhu JJ, Zheng ZQ, Liu XF, Wang WG, Chen ZJ, Zhai SQ (2019) Learning Chinese word embeddings from stroke, structure and pinyin of characters. In: Proceedings of Conference on information and knowledge management (CIKM), pp 1011–1020. https://doi.org/10.1145/3357384.3358005
Wang S, Zhou W, Zhou Q (2020) Radical and stroke-enhanced Chinese word embeddings based on neural networks. Neural Process Lett 52(2):1109–1121. https://doi.org/10.1007/s11063-020-10289-6
Yang Q, Xie H, Cheng G, Wang FL, Rao Y (2021) Pronunciation-enhanced Chinese word embedding. Cogn Comput 13(3):688–697. https://doi.org/10.1007/s12559-021-09850-9
Chen H, Yu S, Lin S (2020) Glyph2vec: Learning Chinese out-of-vocabulary word embedding from glyphs. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 2865–2871. https://doi.org/10.18653/v1/2020.acl-main.256
Liao X, Huang Y, Wei C, Zhang C, Deng Y, Yi K (2021) Efficient estimate of low-frequency words’ embeddings based on the dictionary: a case study on Chinese. Appl Sci Basel 11(22):11018. https://doi.org/10.3390/app112211018
Ye F, Qin Z (2018) Research on pattern representation based on keyword and word embedding in Chinese entity relation extraction. J Adv Comput Intell Intell Inf 22(4):475–482
Huang DG, Pei JH, Zhang C, Huang KY, Ma JJ (2018) Incorporating prior knowledge into word embedding for Chinese word similarity measurement. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 17(3):1–21
Chen C, Gao Y, Ye M, Guo Y (2020) Research on disease identification in Chinese domain based on word embedding technology. In: IEEE international conference on consumer electronics, pp 1–2. https://doi.org/10.1109/ICCE-Taiwan49838.2020.9258316
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Feng J, Huang ML, Zhao L, Yang Y, Zhu XY (2018) Reinforcement learning for relation classification from noisy data. In: Proceedings of AAAI conference on artificial intelligence (AAAI), pp 5779–5786
Tavares AR, Anbalagan S, Marcolino LS, Chaimowicz L (2018) Algorithms or actions? A study in large-scale reinforcement learning. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 2717–2723. https://doi.org/10.24963/ijcai.2018/377
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240
Sutton RS, Mcallester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of conference on neural information processing systems (NIPS), pp 1057–1063
Ciosek K, Whiteson S (2018) Expected policy gradients. In: Proceedings of AAAI conference on artificial intelligence (AAAI), pp 2868–2875
Ashok A, Rhinehart N, Beainy F, Kitani KM (2018) N2N learning: Network to network compression via policy gradient reinforcement learning. In: Proceedings of international conference on learning representations (ICLR)
Zoph B, Le QV (2017) Neural architecture search with reinforcement learning. In: Proceedings of international conference on learning representations (ICLR)
Jin P, Wu YF (2012) Semeval-2012 task 4: evaluating Chinese word similarity. In: Proceedings of joint conference on lexical and computational semantics, pp 1236–1242
Xu J, Liu JW, Zhang LG, Li ZY, Chen HH (2016) Improve Chinese word embeddings by exploiting internal structure. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 1041–1050. https://doi.org/10.18653/v1/n16-1119
Levow G (2006) The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of workshop on chinese language processing, pp 108–117
Ma XZ, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 1746–1751. https://doi.org/10.3115/v1/d14-1181
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of international conference on machine learning (ICML)
Acknowledgements
This research was supported in part by the National Key R&D Program of China under grants 2017YFC1703905 and 2018YFC1704105, the Natural Science Foundation of Sichuan Province under grant 2022NSFSC0958, the Sichuan Science and Technology Program under grants 2020YFS0372 and 2020YFS0302, and the Fundamental Research Funds for the Central Universities ZYGX2021YGLH012. We would like to thank Editage (www.editage.cn) for English language editing.
Author information
Authors and Affiliations
Contributions
YZ, YL performed conceptualization; YZ did methodology; DL and SZ done formal analysis and investigation; YZ contributed to writing—original draft preparation; YL, DL, and SZ were involved in writing—review and editing; YL did funding acquisition; YZ done resources; YL, DL, and SZ supervised the study;
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Liu, Y., Li, D. et al. Exploring Chinese word embedding with similar context and reinforcement learning. Neural Comput & Applic 34, 22287–22302 (2022). https://doi.org/10.1007/s00521-022-07672-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07672-w