Abstract
Community-oriented lexical simplification aims to transform complex words within a sentence into semantically consistent but simple substitute words from a community-specific vocabulary. Most state-of-the-art contextual word embedding models generate substitutes by extracting contextual information of complex words. Although these models take context into account, they fail to capture rich semantics of complex words with polysemy, resulting in many spurious and semantically non-equivalent candidates. Thus, this paper proposes a novel Semantic-Context-Aware framework for Community-oriented Lexical Simplification (SCA-CLS), which integrates gloss (sense definition) into BERT to identify the actual sense of the complex word (especially for polysemy) in current context and ranks substitutes by proposed gloss similarity. In addition, a new complexity feature is proposed to enhance substitute ranking. Experiment results on Wikipedia dataset show that SCA-CLS outperforms the state-of-the-art Merge-Sort model on both substitute generation and ranking tasks, indicating its effectiveness for community-oriented lexical simplification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Feng, L.: Automatic readability assessment for people with intellectual disabilities. In: ACM SIGACCESS Accessibility and Computing, vol. 93, pp. 84–91 (2009)
Hirsh, D., Nation, P.: What vocabulary size is needed to read unsimplified texts for pleasure? Reading Foreign Lang. 8(2), 689–696 (1992)
Nation, I.S.P.: Learning Vocabulary in Another Language. Cambridge University Press, Cambridge (2001)
De Belder, J., Moens, M.F.: Text simplification for children. In: SIGIR Workshop on Accessible Search Systems, pp. 19–26. ACM, New York (2010)
Hao, T., Xie, W., Lee, J.: A semantic-context ranking approach for community-oriented English lexical simplification. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 784–796. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_68
Education Bureau: Enhancing English Vocabulary Learning and Teaching at Secondary Level. http://www.edb.gov.hk/vocab_learning_sec. Accessed May 2020
Peters, M.E., Neumann, M., Iyyer, M., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Qiang, J., Li, Y., Zhu, Y., Yuan, Y., Wu, X.: LSBert: lexical simplification based on BERT. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 99 (2021)
Li, R., Xie, W., Song, J., Wong, L.P., Wang, F.L., Hao, T.: A context-driven merge-sort model for community-oriented lexical simplification. In: 2022 IEEE International Symposium on Product Compliance Engineering-Asia (ISPCE-ASIA), pp. 1–6 (2022)
Shardlow, M.: A survey of automated text simplification. Int. J. Adv. Comput. Sci. Appl. 4(1), 58–70 (2014)
Devlin, S.: The use of a psycholinguistic database in the simplification of text for aphasic readers. Linguistic databases (1998)
Sinha, R.: UNT-SIMPRANK: systems for lexical simplification ranking. In: * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 493–496 (2012)
Nunes, B.P., Kawase, R., Siehndel, P., Casanova, M.A., Dietze, S.: As simple as it gets-a sentence simplifier for different learning levels and contexts. In: 2013 IEEE 13th International Conference on Advanced Learning Technologies, pp. 128–132. IEEE (2013)
Shardlow, M.: Out in the open: finding and categorising errors in the lexical simplification pipeline. In: LREC, pp. 1583–1590 (2014)
Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 63–68 (2015)
Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)
Michalopoulos, G., McKillop, I., Wong, A., Chen, H.: LexSubCon: integrating knowledge from lexical resources into contextual embeddings for lexical substitution. In: The 60th Annual Meeting of the ACL, pp. 1226–1236 (2022)
Yap, B.P., Koh, A., Chng, E.S.: Adapting BERT for word sense disambiguation with gloss selection objective and example sentences. arXiv preprint arXiv:2009.11795 (2020)
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, 8–11 March (1994)
Ehara, Y., Miyao, Y., Oiwa, H., Sato, I., Nakagawa, H.: Formalizing word sampling for vocabulary prediction as graph-based active learning. In: EMNLP, pp. 1374–1384 (2014)
Lee, J.S., Yeung, C.Y.: Personalizing lexical simplification. In: The 27th International Conference on Computational Linguistics (COLING), pp. 224–232 (2018)
Song, J., Hu, J., Wong, L.-P., Lee, L.-K., Hao, T.: A new context-aware method based on hybrid ranking for community-oriented lexical simplification. In: Nah, Y., Kim, C., Kim, S.H., Moon, Y.-S., Whang, S.E. (eds.) DASFAA 2020. LNCS, vol. 12115, pp. 80–92. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59413-8_7
Song, J., Shen, Y., Lee, J., Hao, T.: A hybrid model for community-oriented lexical simplification. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12430, pp. 132–144. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60450-9_11
Gooding, S., Kochmar, E.: Complex word identification as a sequence labelling task. In: The 57th Annual Meeting of the ACL, pp. 1148–1153 (2019)
Brysbaert, M., New, B.: Moving beyond Kuera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 41, 977–990 (2009)
Paetzold, G.H., Specia, L.: A survey on lexical simplification. Int. J. Artif. Intell. Res. 60, 549–593 (2017)
Sharoff, S.: Open-source corpora: using the net to fish for linguistic data. Int. J. Corpus Linguist. 11(4), 435–462 (2006)
Horn, C., Manduca, C., Kauchak, D.: Learning a lexical simplifier using Wikipedia. In: The 52nd Annual Meeting of the ACL (Volume 2: Short Papers), pp. 458–463 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, R., Xie, W., Lee, J., Hao, T. (2023). SCA-CLS: A New Semantic-Context-Aware Framework for Community-Oriented Lexical Simplification. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-44693-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)