Skip to main content

SCA-CLS: A New Semantic-Context-Aware Framework for Community-Oriented Lexical Simplification

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

  • 1065 Accesses

Abstract

Community-oriented lexical simplification aims to transform complex words within a sentence into semantically consistent but simple substitute words from a community-specific vocabulary. Most state-of-the-art contextual word embedding models generate substitutes by extracting contextual information of complex words. Although these models take context into account, they fail to capture rich semantics of complex words with polysemy, resulting in many spurious and semantically non-equivalent candidates. Thus, this paper proposes a novel Semantic-Context-Aware framework for Community-oriented Lexical Simplification (SCA-CLS), which integrates gloss (sense definition) into BERT to identify the actual sense of the complex word (especially for polysemy) in current context and ranks substitutes by proposed gloss similarity. In addition, a new complexity feature is proposed to enhance substitute ranking. Experiment results on Wikipedia dataset show that SCA-CLS outperforms the state-of-the-art Merge-Sort model on both substitute generation and ranking tasks, indicating its effectiveness for community-oriented lexical simplification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Feng, L.: Automatic readability assessment for people with intellectual disabilities. In: ACM SIGACCESS Accessibility and Computing, vol. 93, pp. 84–91 (2009)

    Google Scholar 

  2. Hirsh, D., Nation, P.: What vocabulary size is needed to read unsimplified texts for pleasure? Reading Foreign Lang. 8(2), 689–696 (1992)

    Google Scholar 

  3. Nation, I.S.P.: Learning Vocabulary in Another Language. Cambridge University Press, Cambridge (2001)

    Book  Google Scholar 

  4. De Belder, J., Moens, M.F.: Text simplification for children. In: SIGIR Workshop on Accessible Search Systems, pp. 19–26. ACM, New York (2010)

    Google Scholar 

  5. Hao, T., Xie, W., Lee, J.: A semantic-context ranking approach for community-oriented English lexical simplification. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 784–796. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_68

    Chapter  Google Scholar 

  6. Education Bureau: Enhancing English Vocabulary Learning and Teaching at Secondary Level. http://www.edb.gov.hk/vocab_learning_sec. Accessed May 2020

  7. Peters, M.E., Neumann, M., Iyyer, M., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Qiang, J., Li, Y., Zhu, Y., Yuan, Y., Wu, X.: LSBert: lexical simplification based on BERT. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 99 (2021)

    Google Scholar 

  10. Li, R., Xie, W., Song, J., Wong, L.P., Wang, F.L., Hao, T.: A context-driven merge-sort model for community-oriented lexical simplification. In: 2022 IEEE International Symposium on Product Compliance Engineering-Asia (ISPCE-ASIA), pp. 1–6 (2022)

    Google Scholar 

  11. Shardlow, M.: A survey of automated text simplification. Int. J. Adv. Comput. Sci. Appl. 4(1), 58–70 (2014)

    Google Scholar 

  12. Devlin, S.: The use of a psycholinguistic database in the simplification of text for aphasic readers. Linguistic databases (1998)

    Google Scholar 

  13. Sinha, R.: UNT-SIMPRANK: systems for lexical simplification ranking. In: * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 493–496 (2012)

    Google Scholar 

  14. Nunes, B.P., Kawase, R., Siehndel, P., Casanova, M.A., Dietze, S.: As simple as it gets-a sentence simplifier for different learning levels and contexts. In: 2013 IEEE 13th International Conference on Advanced Learning Technologies, pp. 128–132. IEEE (2013)

    Google Scholar 

  15. Shardlow, M.: Out in the open: finding and categorising errors in the lexical simplification pipeline. In: LREC, pp. 1583–1590 (2014)

    Google Scholar 

  16. Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 63–68 (2015)

    Google Scholar 

  17. Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)

    Google Scholar 

  18. Michalopoulos, G., McKillop, I., Wong, A., Chen, H.: LexSubCon: integrating knowledge from lexical resources into contextual embeddings for lexical substitution. In: The 60th Annual Meeting of the ACL, pp. 1226–1236 (2022)

    Google Scholar 

  19. Yap, B.P., Koh, A., Chng, E.S.: Adapting BERT for word sense disambiguation with gloss selection objective and example sentences. arXiv preprint arXiv:2009.11795 (2020)

  20. Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, 8–11 March (1994)

    Google Scholar 

  21. Ehara, Y., Miyao, Y., Oiwa, H., Sato, I., Nakagawa, H.: Formalizing word sampling for vocabulary prediction as graph-based active learning. In: EMNLP, pp. 1374–1384 (2014)

    Google Scholar 

  22. Lee, J.S., Yeung, C.Y.: Personalizing lexical simplification. In: The 27th International Conference on Computational Linguistics (COLING), pp. 224–232 (2018)

    Google Scholar 

  23. Song, J., Hu, J., Wong, L.-P., Lee, L.-K., Hao, T.: A new context-aware method based on hybrid ranking for community-oriented lexical simplification. In: Nah, Y., Kim, C., Kim, S.H., Moon, Y.-S., Whang, S.E. (eds.) DASFAA 2020. LNCS, vol. 12115, pp. 80–92. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59413-8_7

    Chapter  Google Scholar 

  24. Song, J., Shen, Y., Lee, J., Hao, T.: A hybrid model for community-oriented lexical simplification. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12430, pp. 132–144. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60450-9_11

    Chapter  Google Scholar 

  25. Gooding, S., Kochmar, E.: Complex word identification as a sequence labelling task. In: The 57th Annual Meeting of the ACL, pp. 1148–1153 (2019)

    Google Scholar 

  26. Brysbaert, M., New, B.: Moving beyond Kuera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 41, 977–990 (2009)

    Google Scholar 

  27. Paetzold, G.H., Specia, L.: A survey on lexical simplification. Int. J. Artif. Intell. Res. 60, 549–593 (2017)

    MathSciNet  Google Scholar 

  28. Sharoff, S.: Open-source corpora: using the net to fish for linguistic data. Int. J. Corpus Linguist. 11(4), 435–462 (2006)

    Article  Google Scholar 

  29. Horn, C., Manduca, C., Kauchak, D.: Learning a lexical simplifier using Wikipedia. In: The 52nd Annual Meeting of the ACL (Volume 2: Short Papers), pp. 458–463 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyong Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, R., Xie, W., Lee, J., Hao, T. (2023). SCA-CLS: A New Semantic-Context-Aware Framework for Community-Oriented Lexical Simplification. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44693-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44692-4

  • Online ISBN: 978-3-031-44693-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics