Skip to main content

Aspect-Based Semantic Textual Similarity for Educational Test Items

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2024)

Abstract

In the educational domain, identifying the similarity among test items provides various advantages for exam quality management and personalized student learning. Existing studies mostly relied on student performance data, such as the number of correct or incorrect answers, to measure item similarity. However, nuanced semantic information within the test items has been overlooked, possibly due to the lack of similarity-labeled data. Human-annotated educational data demands high-cost expertise, and items comprising multiple aspects, such as questions and choices, require detailed criteria. In this paper, we introduce a task of aspect-based semantic textual similarity for educational test items (aSTS-EI), where we assess the similarity by specific aspects within test items and present an LLM-guided benchmark dataset. We report the baseline performance by extending the STS methods, setting the groundwork for future aSTS-EI tasks. In addition, to assist data-scarce settings, we propose a progressive augmentation (ProAug) method, which generates step-by-step item aspects via recursive prompting. Experimental results imply the efficacy of existing STS methods for a shorter aspect while underlining the necessity for specialized approaches in relatively longer aspects. Nonetheless, markedly improved results with ProAug highlight the assistance of our augmentation strategy to overcome data scarcity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/doheejin/aSTS-EI.

  2. 2.

    https://github.com/iamyuanchung/TOEFL-QA.

  3. 3.

    Two proficient English teachers with 100% job success are employed on the Upwork, https://www.upwork.com.

References

  1. Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  2. Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inform. Process. Syst. (2020)

    Google Scholar 

  3. Cechák, J., Pelánek, R.: Experimental evaluation of similarity measures for educational items. Intern. Educ. Data Mining Soc. (2021)

    Google Scholar 

  4. Cer, D., et al.: SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: SemEval-2017. ACL (2017)

    Google Scholar 

  5. Chiang, C.H., Lee, H.Y.: Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937 (2023)

  6. Chung, Y.A., Lee, H.Y., Glass, J.: Supervised and unsupervised transfer learning for question answering. In: NAACL HLT (2018)

    Google Scholar 

  7. Formal, T., et al.: From distillation to hard negative sampling: Making sparse neural ir models more effective. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)

    Google Scholar 

  8. Fu, J., et al.: Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166 (2023)

  9. Gao, T., et al.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)

  10. Geng, C., et al.: A recommendation method of teaching resources based on similarity and als. J. Phys. Conf. Ser. (2021)

    Google Scholar 

  11. Harbouche, K., et al.: Measuring similarity of educational items using data on learners’ performance and behavioral parameters: Application of new models scnn-cosine and fuzzy-kappa. Ingenierie des Systemes d’Information (2023)

    Google Scholar 

  12. Huang, T., Li, X.: An empirical study of finding similar exercises. arXiv preprint arXiv:2111.08322 (2021)

  13. Liu, Y., et al.: Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634 (2023)

  14. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  15. OpenAI, T.: Chatgpt: Optimizing language models for dialogue. openai (2022)

    Google Scholar 

  16. Pelánek, R.: Measuring similarity of educational items: an overview. IEEE Trans. Learn. Technol. (2019)

    Google Scholar 

  17. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. (2020)

    Google Scholar 

  18. Rihák, J., Pelánek, R.: Measuring similarity of educational items using data on learners’ performance. Inter. Educ. Data Mining Soc. (2017)

    Google Scholar 

  19. Santhanam, K., et al.: Colbertv2: effective and efficient retrieval via lightweight late interaction. arXiv preprint arXiv:2112.01488 (2021)

  20. Touvron, H., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  21. Tseng, B.H., et al.: Towards machine comprehension of spoken content: Initial toefl listening comprehension test by machine. In: INTERSPEECH (2016)

    Google Scholar 

  22. Wang, W., et al.: Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural Inform. Process. Syst. (2020)

    Google Scholar 

  23. Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020)

  24. Zhan, J., et al.: Optimizing dense retrieval model training with hard negatives. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00223, Development of digital therapeutics to improve communication ability of autism spectrum disorder patients), (No. 2019-0-01906, Artificial Intelligence Graduate School Program (POSTECH)), and Smart HealthCare Program funded by the Korean National Police Agency (KNPA) (No. 220222M01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heejin Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Do, H., Lee, G.G. (2024). Aspect-Based Semantic Textual Similarity for Educational Test Items. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science(), vol 14830. Springer, Cham. https://doi.org/10.1007/978-3-031-64299-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64299-9_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64298-2

  • Online ISBN: 978-3-031-64299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics