Skip to main content

Advertisement

Log in

SPSY: a semantic synthesis framework for lexical sememe prediction and its applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In the era of large language models, capturing fine-grained semantics remains critical, as these models often overlook subtle semantic nuances. Sememes, the smallest units of meaning, are essential for enriching semantic representations. However, existing sememe prediction methods rely solely on intrinsic word features or dictionary definitions, neglecting the potential of subword information to bridge the gap between them. This limitation results in poor performance in predicting out-of-vocabulary (OOV) and low-frequency words. To address this, we propose the Sememe Prediction through Semantic Synthesis (SPSY) framework, which integrates subword-level information with dictionary definitions. This approach enhances sensitivity to subtle semantic variations, significantly improving prediction accuracy. Evaluations on the HowNet and WordNet datasets show that our framework outperforms existing models, achieving a 2.91% gain in mean average precision for the Chinese dataset and a 5.54% gain for the English dataset. It also achieves state-of-the-art performance, surpassing previous models by at least 2.88% across all word frequencies and by 4.11% for OOV words. Furthermore, the framework demonstrates its versatility through successful applications in industrial knowledge graph verification and entity recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Gao H, Zhang P, Zhang J, Yang C (2025) Qsim: a quantum-inspired hierarchical semantic interaction model for text classification. Neurocomputing 611:128658

    Article  MATH  Google Scholar 

  2. Ma H, Xie R, Meng L, Yang Y, Sun X, Kang Z (2024) Seedrec: sememe-based diffusion for sequential recommendation. In: Proceedings of IJCAI, pp 1–9

  3. Lyu M, Mo S (2023) Hsrg-wsd: a novel unsupervised chinese word sense disambiguation method based on heterogeneous sememe-relation graph. International Conference on Intelligent Computing. Springer, Berlin, pp 623–633

    MATH  Google Scholar 

  4. Du J, Qi F, Sun M, Liu Z (2020) Lexical sememe prediction using dictionary definitions by capturing local semantic correspondence. arXiv preprint arXiv:2001.05954

  5. Lyu B, Chen L, Yu K (2021) Glyph enhanced chinese character pre-training for lexical sememe prediction. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp 4549–4555

  6. Luo G, Cui Y (2024) A sememe prediction method based on the central word of a semantic field. Electronics 13(2):413

    Article  MATH  Google Scholar 

  7. Patel R, Domeniconi C (2023) Enhancing out-of-vocabulary estimation with subword attention. In: Findings of the Association for Computational Linguistics: ACL 2023, pp 3592–3601

  8. Liu Y, Li F, Ji D (2024) Improving cross-lingual aspect-based sentiment analysis with sememe bridge. ACM Trans Asian Low Resour Lang Inf Process 23(12):1–22

    Article  MATH  Google Scholar 

  9. Wen Z, Wang R, Luo X, Wang Q, Liang B, Du J, Yu X, Gui L, Xu R (2023) Multi-perspective contrastive learning framework guided by sememe knowledge and label information for sarcasm detection. Int J Mach Learn Cybern 14(12):4119–4134

    Article  Google Scholar 

  10. Gao H, Zhang P, Zhang J, Yang C (2024) Qsim: a quantum-inspired hierarchical semantic interaction model for text classification. Neurocomputing, 128658

  11. Qin Y, Liu Z, Lin Y, Sun M (2023) Sememe-based lexical knowledge representation learning. Representation Learning for Natural Language Processing. Springer, Singapore, pp 351–400

    Chapter  MATH  Google Scholar 

  12. Zhao Q, Gao T, Guo N (2023) La-mgfm: a legal judgment prediction method via sememe-enhanced graph neural networks and multi-graph fusion mechanism. Inf Process Manag 60(5):103455

    Article  Google Scholar 

  13. Zhao Q, Gao T, Guo N (2023) Document-level relation extraction based on sememe knowledge-enhanced abstract meaning representation and reasoning. Complex Intell Syst 9(6):6553–6566

    Article  MATH  Google Scholar 

  14. Xie R, Yuan X, Liu Z, Sun M (2017) Lexical sememe prediction via word embeddings and matrix factorization. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp 4200–4206

  15. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, pp 285–295

  16. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37

    Article  MATH  Google Scholar 

  17. Jin H, Zhu H, Liu Z, Xie R, Sun M, Lin F, Lin L (2018) Incorporating Chinese Characters of Words for Lexical Sememe Prediction

  18. Li W, Ren X, Dai D, Wu Y, Wang H, Sun X (2018) Sememe prediction: learning semantic knowledge from unstructured textual wiki descriptions. arXiv preprint arXiv:1808.05437

  19. Sun Z, Li X, Sun X, Meng Y, Ao X, He Q, Wu F, Li J (2021) ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information

  20. Sennrich R (2015) Neural machine translation of rare words with subword units

  21. He Y, Hutchinson B, Baumann P, Ostendorf M, Fosler-Lussier E, Pierrehumbert J (2014) Subword-based modeling for handling oov words inkeyword spotting. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 7864–7868

  22. Sun T, Shao Y, Qiu X, Guo Q, Hu Y, Huang X, Zhang Z (2020) Colake: contextualized language and knowledge embedding. arXiv preprint arXiv:2010.00309

  23. Ke Y, Hagiwara M (2017) Radical-level ideograph encoder for rnn-based sentiment analysis of chinese and japanese. In: Asian Conference on Machine Learning. PMLR, pp 561–573

  24. Nguyen M, Ngo GH, Chen NF (2019) Hierarchical character embeddings: learning phonological and semantic representations in languages of logographic origin using recursive neural networks. IEEE/ACM Trans. Audio Speech Lang Process 28:461–473

    Article  MATH  Google Scholar 

  25. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  26. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415

  27. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  28. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26

  29. Ma W, Cui Y, Si C, Liu T, Wang S, Hu G (2020) Charbert: character-aware pre-trained language model. arXiv preprint arXiv:2011.01513

  30. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans. Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  31. Wang H, Liu S, Duan J, He L, Li X (2023) Chinese lexical sememe prediction using ciline knowledge. IEICE Trans Fundam Electron Commun Comput Sci 106(2):146–153

    Article  Google Scholar 

  32. Athiwaratkun B, Wilson AG, Anandkumar A (2018) Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901

Download references

Acknowledgements

This research is supported by the National Key Research and Development Program of China (2020AAA0109300) and the Shanghai Collaborative Innovation Center of data intelligence technology (No. 0232-A1-8900-24-13).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianpeng Hu.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethics approval

We confirm that this manuscript has not been published elsewhere and is not under consideration by another journal. All authors have approved the manuscript and agree with its submission to Supercomputing Journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A List of acronyms

Appendix A List of acronyms

In this appendix, we provide a list of all the acronyms used throughout the paper, along with their full forms. As shown in Table 9, this table is intended to help readers better understand the terminology used and ensure clarity in communication.

Table 9 List of acronyms and their full forms

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, T., Hu, J., Zhao, J. et al. SPSY: a semantic synthesis framework for lexical sememe prediction and its applications. J Supercomput 81, 552 (2025). https://doi.org/10.1007/s11227-025-07070-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07070-8

Keywords