ProSyno: context-free prompt learning for synonym discovery

Zhang, Song; He, Lei; Wang, Dong; Bao, Hongyun; Zheng, Suncong; Liu, Yuqiao; Xiao, Baihua; Li, Jiayue; Lu, Dongyuan; Zheng, Nan

doi:10.1007/s11704-024-3900-z

ProSyno: context-free prompt learning for synonym discovery

Research Article
Published: 12 December 2024

Volume 19, article number 196317, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Song Zhang^1,2,
Lei He³,
Dong Wang³,
Hongyun Bao¹,
Suncong Zheng³,
Yuqiao Liu^1,2,
Baihua Xiao¹,
Jiayue Li⁵,
Dongyuan Lu⁴ &
…
Nan Zheng^1,2

94 Accesses
1 Citation
Explore all metrics

Abstract

Synonym discovery is important in a wide variety of concept-related tasks, such as entity/concept mining and industrial knowledge graph (KG) construction. It intends to determine whether two terms refer to the same concept in semantics. Existing methods rely on contexts or KGs. However, these methods are often impractical in some cases where contexts or KGs are not available. Therefore, this paper proposes a context-free prompt learning based synonym discovery method called ProSyno, which takes the world’s largest freely available dictionary Wiktionary as a semantic source. Based on a pre-trained language model (PLM), we employ a prompt learning method to generalize to other datasets without any fine-tuning. Thus, our model is more appropriate for context-free situation and can be easily transferred to other fields. Experimental results demonstrate its superiority comparing with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Word-Sememe Graph-Centric Chinese Antonym Detection

Entity Synonym Discovery via Multiple Attentions

Context Analysis for Computer-Assisted Near-Synonym Learning

References

Luo X, Bo L, Wu J, Li L, Luo Z, Yang Y, Yang K. AliCoCo2: commonsense knowledge extraction, representation and application in E-commerce. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3385–3393
Chapter Google Scholar
Li M, Xing Y, Kong F, Zhou G. Towards better entity linking. Frontiers of Computer Science, 2022, 16(2): 162308
Article Google Scholar
Zhang M, He T, Dong M. Meta-path reasoning of knowledge graph for commonsense question answering. Frontiers of Computer Science, 2024, 18(1): 181303
Article Google Scholar
Xu D, Miller T. A simple neural vector space model for medical concept normalization using concept embeddings. Journal of Biomedical Informatics, 2022, 130: 104080
Article Google Scholar
Zhang C, Li Y, Du N, Fan W, Yu P S. Entity synonym discovery via multipiece bilateral context matching. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 199
Google Scholar
Pei S, Yu L, Zhang X. Set- aware entity synonym discovery with flexible receptive fields. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 891–904
Google Scholar
Yuan Z, Zhao Z, Sun H, Li J, Wang F, Yu S. CODER: knowledge-infused cross-lingual medical term embedding for term normalization. Journal of Biomedical Informatics, 2022, 126: 103983
Article Google Scholar
Garcia M. Exploring the representation of word meanings in context: a case study on homonymy and synonymy. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 3625–3640
Google Scholar
Miftahutdinov Z, Tutubalina E. Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2019, 393–399
Chapter Google Scholar
Wang Z, Yue X, Moosavinasab S, Huang Y, Lin S, Sun H. SurfCon: synonym discovery on privacy-aware clinical data. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 1578–1586
Chapter Google Scholar
Gao Y, Wang X, He X, Feng H, Zhang Y. Rumor detection with self-supervised learning on texts and social graph. Frontiers of Computer Science, 2023, 17(4): 174611
Article Google Scholar
Zhang N, Jia Q, Deng S, Chen X, Ye H, Chen H, Tou H, Huang G, Wang Z, Hua N, Chen H. AliCG: fine-grained and evolvable conceptual graph construction for semantic search at Alibaba. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3895–3905
Chapter Google Scholar
Xie T, Wu B, Jia B, Wang B. Graph- ranking collective Chinese entity linking algorithm. Frontiers of Computer Science, 2020, 14(2): 291–303
Article Google Scholar
Wang C, He X, Zhou A. A short survey on taxonomy learning from text corpora: Issues, resources and recent advances. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 1190–1203
Google Scholar
Zhang J, Trujillo L B, Li T, Tanwar A, Freire G, Yang X, Ive J, Gupta V, Guo Y. Self-supervised detection of contextual synonyms in a multi-class setting: Phenotype annotation use case. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8754–8769
Chapter Google Scholar
Zhang T, Cai Z, Wang C, Qiu M, Yang B, He X. SMedBERT: a knowledge-enhanced pre-trained language model with structured semantics for medical text mining. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 5882–5893
Google Scholar
Yang Y, Yin X, Yang H, Fei X, Peng H, Zhou K, Lai K, Shen J. KGSynNet: a novel entity synonyms discovery framework with knowledge graph. In: Proceedings of the 26th International Conference. 2021, 174–190
Google Scholar
Wang C, Qiu M, Huang J, He X. KEML: a knowledge-enriched meta-learning framework for lexical relation classification. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 13924–13932
Google Scholar
Shen J, Lyu R, Ren X, Vanni M, Sadler B, Han J. Mining entity synonyms with efficient neural set generation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 249–256
Google Scholar
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9
Google Scholar
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019, 4171–4186
Google Scholar
Zeng J, Wang Z, Yu Y, Wen J, Gao M. Word embedding methods in natural language processing: a review. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 24–43
Google Scholar
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55(9): 195
Article Google Scholar
Li X L, Liang P. Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4582–4597
Google Scholar
Zhong Z, Friedman D, Chen D. Factual probing is [MASK]: learning vs. learning to recall. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 5017–5033
Google Scholar
Izbicki M. Aligning word vectors on low-resource languages with wiktionary. In: Proceedings of the 5th Workshop on Technologies for Machine Translation of Low-Resource Languages. 2022, 107–117
Google Scholar
Bajčetić L, Declerck T. Using wiktionary to create specialized lexical resources and datasets. In: Proceedings of the 13th Conference on Language Resources and Evaluation. 2022
Google Scholar
Fang Y, Wang S, Xu Y, Xu R, Sun S, Zhu C, Zeng M. Leveraging knowledge in multilingual commonsense reasoning. In: Proceedings of the Findings of the Association for Computational Linguistics. 2022, 3237–3246
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5998–6008
Google Scholar
Miller G A. WordNet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41
Article Google Scholar
Limsopatham N, Collier N. Normalising medical concepts in social media texts by learning semantic representation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 1014–1023
Google Scholar
Tutubalina E, Miftahutdinov Z, Nikolenko S, Malykh V. Medical concept normalization in social media posts with recurrent neural networks. Journal of Biomedical Informatics, 2018, 84: 93–102
Article Google Scholar
Xu D, Zhang Z, Bethard S. A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8452–8464
Chapter Google Scholar
Lee J, Yoon W, Kim S, Kim D, Kim S, So C H, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36(4): 1234–1240
Article Google Scholar
Xie Z, Zeng N. A mixture-of-experts model for antonym-synonym discrimination. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 558–564
Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5: 135–146
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2023YFC3304104) and the National Natural Science Foundation of China (Grant No. 62172094).

Author information

Authors and Affiliations

State Key Laboratory of Multimodal Artifcial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Song Zhang, Hongyun Bao, Yuqiao Liu, Baihua Xiao & Nan Zheng
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100190, China
Song Zhang, Yuqiao Liu & Nan Zheng
Tencent AI Platform Department, Beijing, 100048, China
Lei He, Dong Wang & Suncong Zheng
University of International Business and Economics, Beijing, 100029, China
Dongyuan Lu
Beijing Academy of Blockchain and Edge Computing, Beijing, 100085, China
Jiayue Li

Authors

Song Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Lei He
View author publications
You can also search for this author inPubMed Google Scholar
Dong Wang
View author publications
You can also search for this author inPubMed Google Scholar
Hongyun Bao
View author publications
You can also search for this author inPubMed Google Scholar
Suncong Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Yuqiao Liu
View author publications
You can also search for this author inPubMed Google Scholar
Baihua Xiao
View author publications
You can also search for this author inPubMed Google Scholar
Jiayue Li
View author publications
You can also search for this author inPubMed Google Scholar
Dongyuan Lu
View author publications
You can also search for this author inPubMed Google Scholar
Nan Zheng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Jiayue Li, Dongyuan Lu or Nan Zheng.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Song Zhang is a PhD candidate at Institute of Automation, Chinese Academy of Sciences (CAS), China. His research interests include NLP and machine learning.

Lei He is a senior research engineer at Machine Learning Platform Department in Tencent, China. She received the PhD degree from the Institute of Computing Technology, CAS, China in 2018. Her research interests include NLP and machine learning.

Dong Wang is an algorithm engineer at Tencent, China. He received the MS degree from Tsinghua University, China in 2021. His research interests include NLP, deep learning and KG.

Hongyun Bao is an associate professor in Institute of Automation, CAS, China. She received the PhD degree from Institute of Automation, CAS, China in 2013. Her research interests include KG construction and information extraction.

Suncong Zheng is responsible for Tencent’s Lexical tools, Tencent’s large-scale knowledge graph Topbase. He received the PhD degree from Institute of automation, CAS, China in 2017 and obtained ACL-2017 outstanding paper award. His research interests include information extraction, KB-QA and recommendation.

Yuqiao Liu is studying for a master’s degree at CAS, China. His research interests include recommendation system and data mining.

Baihua Xiao is a professor in Institute of Automation, CAS, China. He received the BS degree in automatic control from Northwestern Polytechnical University, China in 1995, and the PhD degree in computer science from Institute of Automation, CAS, China in 2000. His research interests include pattern recognition, computer vision, image processing, and machine learning.

Jiayue Li received his PhD degree in computer science and engineering from The Hong Kong University of Science and Technology, China. He did postdoctoral research in Arizona State University, USA from 2018 to 2019. His research mainly focuses on pattern recognition, medical imaging, and distributed ledger technology.

Dongyuan Lu is a professor in University of International Business and Economics, China. She received her PhD degree from Institute of Automation, CAS, China in 2012. Her research interests include data mining and natural language processing.

Nan Zheng is an associate professor at Institute of Automation, CAS, China. She received the PhD degree from Institute of Automation, CAS, China in 2012. Her research interests include data mining and machine learning. She was a visiting scholar at University of California, Berkeley, USA in 2019.

Electronic supplementary material