Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases

Qi, Fanchao; Xie, Ruobing; Zang, Yuan; Liu, Zhiyuan; Sun, Maosong

doi:10.1007/s11704-020-0002-4

Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases

Review Article
Published: 29 April 2021

Volume 15, article number 155327, (2021)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Fanchao Qi^1,2,3,
Ruobing Xie⁴,
Yuan Zang^1,2,3,
Zhiyuan Liu^1,2,3,5 &
…
Maosong Sun^1,2,3,5

161 Accesses
3 Citations
Explore all metrics

Abstract

A sememe is defined as the minimum semantic unit of languages in linguistics. Sememe knowledge bases are built by manually annotating sememes for words and phrases. HowNet is the most well-known sememe knowledge base. It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages. In the era of deep learning, although data are thought to be of vital importance, there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance. Some successful attempts have been made in the tasks including word representation learning, language modeling, semantic composition, etc. In addition, considering the high cost of manual annotation and update for sememe knowledge bases, some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases. Besides, some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language. In this paper, we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sememe-Based Lexical Knowledge Representation Learning

Sememe Tree Prediction for English-Chinese Word Pairs

Evaluating Semantic Rationality of a Sentence: A Sememe-Word-Matching Neural Network Based on HowNet

References

Bloomfield L. A set of postulates for the science of language. Language, 1926, 2(3): 153–164
Article Google Scholar
Wierzbicka A. Semantics: Primes and Universals. Oxford: Oxford University Press, 1996
Google Scholar
Dong Z, Dong Q. HowNet and the Computation of Meaning. Singapore: World Scientific Publishing, 2006
Book Google Scholar
Gan K W, Wong P W. Annotating information structures in Chinese texts using HowNet. In: Proceedings of the 2nd Chinese Language Processing Workshop. 2000, 85–92
Liu Q, Li S. Word similarity computing based on HowNet. International Journal of Computational Linguistics & Chinese Language Processing, 2002, 7(2): 59–76
Google Scholar
Zhang Y, Gong L, Wang Y. Chinese word sense disambiguation using HowNet. In: Proceedings of International Conference on Natural Computation. 2005, 925–932
Duan X, Zhao J, Xu B. Word sense disambiguation through sememe labeling. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, 1594–1599
Zhu Y, Min J, Zhou Y, Huang X, Wu L. Semantic orientation computing based on HowNet. Journal of Chinese Information Processing, 2006, 20(1): 14–20
Google Scholar
Dang L, Zhang L. Method of discriminant for Chinese sentence sentiment orientation based on HowNet. Application Research of Computers, 2010, 4: 43
Google Scholar
Fu X, Liu G, Guo Y, Wang Z. Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Knowledge-Based Systems, 2013, 37: 186–195
Article Google Scholar
Sun J, Cai D, Lv D, Dong Y. HowNet based Chinese question automatic classification. Journal of Chinese Information Processing, 2007, 21(1): 90–95
Google Scholar
Moro A, Raganato A, Navigli R. Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2014, 2: 231–244
Article Google Scholar
Faruqui M, Dodge J, Jauhar S K, Dyer C, Hovy E, Smith N A. Retrofitting word vectors to semantic lexicons. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015, 1606–1615
Chen Q, Zhu X, Ling Z H, Inkpen D, Wei S. Neural natural language inference models enhanced with external knowledge. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 2406–2417
Sun M, Chen X. Embedding for words and word senses based on human annotated knowledge base: use HowNet as a case study. Journal of Chinese Information Processing, 2016, 30(6): 1–5
Google Scholar
Niu Y, Xie R, Liu Z, Sun M. Improved word representation learning with sememes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 2049–2058
Gu Y, Yan J, Zhu H, Liu Z, Xie R, Sun M, Lin F, Lin L. Language modeling with sparse product of sememe experts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 4642–4651
Zeng X, Yang C, Tu C, Liu Z, Sun M. Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 5650–5657
Qi F, Huang J, Yang C, Liu Z, Chen X, Liu Q, Sun M. Modeling semantic compositionality with sememe knowledge. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 5706–5715
Qin Y, Qi F, Ouyang S, Liu Z, Yang C, Wang Y, Liu Q, Sun M. Enhancing recurrent neural networks with sememes. 2019, arXiv preprint arXiv:1910.08910
Luo L, Ao X, Song Y, Li J, Yang X, He Q, Yu D. Unsupervised neural aspect extraction with sememes. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 5123–5129
Zhang L, Qi F, Liu Z, Wang Y, Liu Q, Sun M. Multi-channel reverse dictionary model. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 312–319
Zang Y, Qi F, Yang C, Liu Z, Zhang M, Liu Q, Sun M. Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 6066–6080
Xie R, Yuan X, Liu Z, Sun M. Lexical sememe prediction via word embeddings and matrix factorization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 4200–4206
Sarwar B, Karypis G, Konstan J, Riedl J. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web. 2001, 285–295
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30–37
Article Google Scholar
Jin H, Zhu H, Liu Z, Xie R, Sun M, Lin F, Lin L. Incorporating Chinese characters of words for lexical sememe prediction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 2439–2449
Du J, Qi F, Sun M, Liu Z. Lexical sememe prediction by dictionary definitions and local semantic correspondence. Journal of Chinese Information Processing, 2020, 34(5): 1–9
Google Scholar
Qi F, Lin Y, Sun M, Zhu H, Xie R, Liu Z. Cross-lingual lexical sememe prediction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 358–368
Qi F, Chang L, Sun M, Sicong O, Liu Z. Towards building a multilingual sememe knowledge base: predicting sememes for BabelNet synsets. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 8624–8631
Miller G A. WordNet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41
Article Google Scholar
Speer R, Chin J, Havasi C. Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4444–4451
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: Proceedings of 2013 International Conference on Learning Representations Workshop. 2013
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
Article Google Scholar
Hinton G. Products of experts. In: Proceedings of the 9th International Conference on Artificial Neural Networks. 1999, 1–6
Pelletier F J. The principle of semantic compositionality. Topoi, 1994, 13(1): 11–24
Article MathSciNet Google Scholar
Pelletier F J. Semantic compositionality. In: Oxford Research Encyclopedia of Linguistics. Oxford University Press, 2016
Mitchell J, Lapata M. Language models based on semantic composition. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 430–439
Socher R, Bauer J, Manning C D, Ng A Y. Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 455–465
Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142–150
Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D, Ng A Y, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013, 1631–1642
Mitchell J, Lapata M. Vector-based models of semantic composition. In: Proceedings of ACL-08: HLT. 2008, 236–244
Navigli R, Ponzetto S P. BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 2012, 193: 217–250
Article MathSciNet MATH Google Scholar
Chen X, Xu L, Liu Z, Sun M, Luan H. Joint learning of character and word embeddings. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015, 1236–1242
Camacho-Collados J, Pilehvar M T, Navigli R. Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence, 2016, 240: 36–64
Article MathSciNet MATH Google Scholar
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th Conference on Neural Information Processing Systems. 2013, 2787–2795
Qi F, Yang C, Liu Z, Dong Q, Sun M, Dong Z. OpenHowNet: an open sememe-based lexical knowledge base. 2019, arXiv preprint arXiv:1901.09957

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2018YFB1004503) and the National Natural Science Foundation of China (NSFC Grant Nos. 61732008, 61532010). We also thank the anonymous reviewers for their comments.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Fanchao Qi, Yuan Zang, Zhiyuan Liu & Maosong Sun
Institute for Artificial Intelligence, Tsinghua University, Beijing, 100084, China
Fanchao Qi, Yuan Zang, Zhiyuan Liu & Maosong Sun
Beijing National Research Center for Information Science and Technology, Beijing, 100084, China
Fanchao Qi, Yuan Zang, Zhiyuan Liu & Maosong Sun
Search Product Center, WeChat Search Application Department, Tencent, Beijing, 100080, China
Ruobing Xie
Beijing Academy of Artificial Intelligence, Beijing, 100191, China
Zhiyuan Liu & Maosong Sun

Authors

Fanchao Qi
View author publications
Search author on:PubMed Google Scholar
Ruobing Xie
View author publications
Search author on:PubMed Google Scholar
Yuan Zang
View author publications
Search author on:PubMed Google Scholar
Zhiyuan Liu
View author publications
Search author on:PubMed Google Scholar
Maosong Sun
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zhiyuan Liu.

Additional information

Fanchao Qi is a PhD student of the Department of Computer Science and Technology, Tsinghua University, China. He got his BEng degree in 2017 from the Department of Electronic Engineering, Tsinghua University, China. His research interests include natural language processing and computational semantics. He has published papers in international conferences including AAAI, ACL and EMNLP.

Ruobing Xie is a researcher of WeChat, Ten-cent. He got his BEng degree in 2014 and his master degree in 2017 from the Department of Computer Science and Technology, Tsinghua University, China. His research interests are natural language processing and recommender system. He has published over 15 papers in international journals and conferences including IJCAI, AAAI, ACL and EMNLP.

Yuan Zang is an undergraduate student of the Department of Computer Science and Technology, Tsinghua University, China. His research interests lie in natural language processing and adversarial learning. He has published papers in ACL.

Zhiyuan Liu is an associate professor of the Department of Computer Science and Technology, Tsinghua University, China. He got his BEng degree in 2006 and his PhD in 2011 from the Department of Computer Science and Technology, Tsinghua University, China. His research interests are natural language processing and social computation. He has published over 40 papers in international journals and conferences including ACM Transactions, IJCAI, AAAI, ACL and EMNLP.

Maosong Sun is a professor of the Department of Computer Science and Technology, Tsinghua University, China. He got his BEng degree in 1986 and MEng degree in 1988 from Department of Computer Science and Technology, Tsinghua University, and got his PhD degree in 2004 from Department of Chinese, Translation, and Linguistics, City University of Hong Kong, China. His research interests include natural language processing, Chinese computing, Web intelligence, and computational social sciences. He has published over 150 papers in academic journals and international conferences in the above fields. He serves as the council member of China Computer Federation, the director of Massive Online Education Research Center of Tsinghua University, and the Editor-in-Chief of the Journal of Chinese Information Processing.

Electronic supplementary material