Skip to main content

Advertisement

Log in

ProSyno: context-free prompt learning for synonym discovery

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Synonym discovery is important in a wide variety of concept-related tasks, such as entity/concept mining and industrial knowledge graph (KG) construction. It intends to determine whether two terms refer to the same concept in semantics. Existing methods rely on contexts or KGs. However, these methods are often impractical in some cases where contexts or KGs are not available. Therefore, this paper proposes a context-free prompt learning based synonym discovery method called ProSyno, which takes the world’s largest freely available dictionary Wiktionary as a semantic source. Based on a pre-trained language model (PLM), we employ a prompt learning method to generalize to other datasets without any fine-tuning. Thus, our model is more appropriate for context-free situation and can be easily transferred to other fields. Experimental results demonstrate its superiority comparing with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Luo X, Bo L, Wu J, Li L, Luo Z, Yang Y, Yang K. AliCoCo2: commonsense knowledge extraction, representation and application in E-commerce. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3385–3393

    Chapter  Google Scholar 

  2. Li M, Xing Y, Kong F, Zhou G. Towards better entity linking. Frontiers of Computer Science, 2022, 16(2): 162308

    Article  Google Scholar 

  3. Zhang M, He T, Dong M. Meta-path reasoning of knowledge graph for commonsense question answering. Frontiers of Computer Science, 2024, 18(1): 181303

    Article  Google Scholar 

  4. Xu D, Miller T. A simple neural vector space model for medical concept normalization using concept embeddings. Journal of Biomedical Informatics, 2022, 130: 104080

    Article  Google Scholar 

  5. Zhang C, Li Y, Du N, Fan W, Yu P S. Entity synonym discovery via multipiece bilateral context matching. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 199

    Google Scholar 

  6. Pei S, Yu L, Zhang X. Set- aware entity synonym discovery with flexible receptive fields. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 891–904

    Google Scholar 

  7. Yuan Z, Zhao Z, Sun H, Li J, Wang F, Yu S. CODER: knowledge-infused cross-lingual medical term embedding for term normalization. Journal of Biomedical Informatics, 2022, 126: 103983

    Article  Google Scholar 

  8. Garcia M. Exploring the representation of word meanings in context: a case study on homonymy and synonymy. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 3625–3640

    Google Scholar 

  9. Miftahutdinov Z, Tutubalina E. Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2019, 393–399

    Chapter  Google Scholar 

  10. Wang Z, Yue X, Moosavinasab S, Huang Y, Lin S, Sun H. SurfCon: synonym discovery on privacy-aware clinical data. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 1578–1586

    Chapter  Google Scholar 

  11. Gao Y, Wang X, He X, Feng H, Zhang Y. Rumor detection with self-supervised learning on texts and social graph. Frontiers of Computer Science, 2023, 17(4): 174611

    Article  Google Scholar 

  12. Zhang N, Jia Q, Deng S, Chen X, Ye H, Chen H, Tou H, Huang G, Wang Z, Hua N, Chen H. AliCG: fine-grained and evolvable conceptual graph construction for semantic search at Alibaba. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3895–3905

    Chapter  Google Scholar 

  13. Xie T, Wu B, Jia B, Wang B. Graph- ranking collective Chinese entity linking algorithm. Frontiers of Computer Science, 2020, 14(2): 291–303

    Article  Google Scholar 

  14. Wang C, He X, Zhou A. A short survey on taxonomy learning from text corpora: Issues, resources and recent advances. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 1190–1203

    Google Scholar 

  15. Zhang J, Trujillo L B, Li T, Tanwar A, Freire G, Yang X, Ive J, Gupta V, Guo Y. Self-supervised detection of contextual synonyms in a multi-class setting: Phenotype annotation use case. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8754–8769

    Chapter  Google Scholar 

  16. Zhang T, Cai Z, Wang C, Qiu M, Yang B, He X. SMedBERT: a knowledge-enhanced pre-trained language model with structured semantics for medical text mining. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 5882–5893

    Google Scholar 

  17. Yang Y, Yin X, Yang H, Fei X, Peng H, Zhou K, Lai K, Shen J. KGSynNet: a novel entity synonyms discovery framework with knowledge graph. In: Proceedings of the 26th International Conference. 2021, 174–190

    Google Scholar 

  18. Wang C, Qiu M, Huang J, He X. KEML: a knowledge-enriched meta-learning framework for lexical relation classification. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 13924–13932

    Google Scholar 

  19. Shen J, Lyu R, Ren X, Vanni M, Sadler B, Han J. Mining entity synonyms with efficient neural set generation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 249–256

    Google Scholar 

  20. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9

    Google Scholar 

  21. Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019, 4171–4186

    Google Scholar 

  22. Zeng J, Wang Z, Yu Y, Wen J, Gao M. Word embedding methods in natural language processing: a review. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 24–43

    Google Scholar 

  23. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55(9): 195

    Article  Google Scholar 

  24. Li X L, Liang P. Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4582–4597

    Google Scholar 

  25. Zhong Z, Friedman D, Chen D. Factual probing is [MASK]: learning vs. learning to recall. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 5017–5033

    Google Scholar 

  26. Izbicki M. Aligning word vectors on low-resource languages with wiktionary. In: Proceedings of the 5th Workshop on Technologies for Machine Translation of Low-Resource Languages. 2022, 107–117

    Google Scholar 

  27. Bajčetić L, Declerck T. Using wiktionary to create specialized lexical resources and datasets. In: Proceedings of the 13th Conference on Language Resources and Evaluation. 2022

    Google Scholar 

  28. Fang Y, Wang S, Xu Y, Xu R, Sun S, Zhu C, Zeng M. Leveraging knowledge in multilingual commonsense reasoning. In: Proceedings of the Findings of the Association for Computational Linguistics. 2022, 3237–3246

    Google Scholar 

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5998–6008

    Google Scholar 

  30. Miller G A. WordNet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41

    Article  Google Scholar 

  31. Limsopatham N, Collier N. Normalising medical concepts in social media texts by learning semantic representation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 1014–1023

    Google Scholar 

  32. Tutubalina E, Miftahutdinov Z, Nikolenko S, Malykh V. Medical concept normalization in social media posts with recurrent neural networks. Journal of Biomedical Informatics, 2018, 84: 93–102

    Article  Google Scholar 

  33. Xu D, Zhang Z, Bethard S. A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8452–8464

    Chapter  Google Scholar 

  34. Lee J, Yoon W, Kim S, Kim D, Kim S, So C H, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36(4): 1234–1240

    Article  Google Scholar 

  35. Xie Z, Zeng N. A mixture-of-experts model for antonym-synonym discrimination. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 558–564

    Google Scholar 

  36. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5: 135–146

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2023YFC3304104) and the National Natural Science Foundation of China (Grant No. 62172094).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiayue Li, Dongyuan Lu or Nan Zheng.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Song Zhang is a PhD candidate at Institute of Automation, Chinese Academy of Sciences (CAS), China. His research interests include NLP and machine learning.

Lei He is a senior research engineer at Machine Learning Platform Department in Tencent, China. She received the PhD degree from the Institute of Computing Technology, CAS, China in 2018. Her research interests include NLP and machine learning.

Dong Wang is an algorithm engineer at Tencent, China. He received the MS degree from Tsinghua University, China in 2021. His research interests include NLP, deep learning and KG.

Hongyun Bao is an associate professor in Institute of Automation, CAS, China. She received the PhD degree from Institute of Automation, CAS, China in 2013. Her research interests include KG construction and information extraction.

Suncong Zheng is responsible for Tencent’s Lexical tools, Tencent’s large-scale knowledge graph Topbase. He received the PhD degree from Institute of automation, CAS, China in 2017 and obtained ACL-2017 outstanding paper award. His research interests include information extraction, KB-QA and recommendation.

Yuqiao Liu is studying for a master’s degree at CAS, China. His research interests include recommendation system and data mining.

Baihua Xiao is a professor in Institute of Automation, CAS, China. He received the BS degree in automatic control from Northwestern Polytechnical University, China in 1995, and the PhD degree in computer science from Institute of Automation, CAS, China in 2000. His research interests include pattern recognition, computer vision, image processing, and machine learning.

Jiayue Li received his PhD degree in computer science and engineering from The Hong Kong University of Science and Technology, China. He did postdoctoral research in Arizona State University, USA from 2018 to 2019. His research mainly focuses on pattern recognition, medical imaging, and distributed ledger technology.

Dongyuan Lu is a professor in University of International Business and Economics, China. She received her PhD degree from Institute of Automation, CAS, China in 2012. Her research interests include data mining and natural language processing.

Nan Zheng is an associate professor at Institute of Automation, CAS, China. She received the PhD degree from Institute of Automation, CAS, China in 2012. Her research interests include data mining and machine learning. She was a visiting scholar at University of California, Berkeley, USA in 2019.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., He, L., Wang, D. et al. ProSyno: context-free prompt learning for synonym discovery. Front. Comput. Sci. 19, 196317 (2025). https://doi.org/10.1007/s11704-024-3900-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-3900-z

Keywords