Skip to main content

Reducing the Cost: Cross-Prompt Pre-finetuning for Short Answer Scoring

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13916))

Included in the following conference series:

Abstract

Automated Short Answer Scoring (SAS) is the task of automatically scoring a given input to a prompt based on rubrics and reference answers. Although SAS is useful in real-world applications, both rubrics and reference answers differ between prompts, thus requiring a need to acquire new data and train a model for each new prompt. Such requirements are costly, especially for schools and online courses where resources are limited and only a few prompts are used. In this work, we attempt to reduce this cost through a two-phase approach: train a model on existing rubrics and answers with gold score signals and finetune it on a new prompt. Specifically, given that scoring rubrics and reference answers differ for each prompt, we utilize key phrases, or representative expressions that the answer should contain to increase scores, and train a SAS model to learn the relationship between key phrases and answers using already annotated prompts (i.e., cross-prompts). Our experimental results show that finetuning on existing cross-prompt data with key phrases significantly improves scoring accuracy, especially when the training data is limited. Finally, our extensive analysis shows that it is crucial to design the model so that it can learn the task’s general property. We publicly release our code and all of the experimental settings for reproducing our results (https://github.com/hiro819/Reducing-the-cost-cross-prompt-prefinetuning-for-SAS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/hiro819/Reducing-the-cost-cross-prompt-prefinetuning-for-SAS.

  2. 2.

    https://aip-nlu.gitlab.io/resources/sas-japanese.

  3. 3.

    Type of question in which the student reads a essay and answers prompts about its content.

  4. 4.

    We used pretrained BERT models from https://github.com/cl-tohoku/bert-japanese for Japanese.

References

  1. Aghajanyan, A., Gupta, A., Shrivastava, A., Chen, X., Zettlemoyer, L., Gupta, S.: Muppet: massive multi-task representations with pre-finetuning. In: EMNLP, pp. 5799–5811. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.468

  2. Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)

    Article  Google Scholar 

  3. Cohen, J.: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–220 (1968)

    Article  Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423

  5. Funayama, H., et al.: Balancing cost and quality: an exploration of human-in-the-loop frameworks for automated short answer scoring. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds.) AIED 2022. LNCS, vol. 13355, pp. 465–476. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-11644-5_38

    Chapter  Google Scholar 

  6. Haller, S., Aldea, A., Seifert, C., Strisciuglio, N.: Survey on automated short answer grading with deep learning: from word embeddings to transformers (2022)

    Google Scholar 

  7. Kumar, Y., et al.: Get it scored using autosas - an automated system for scoring short answers. In: AAAI/IAAI/EAAI. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33019662

  8. Mizumoto, T., et al.: Analytic score prediction and justification identification in automated short answer scoring. In: BEA, pp. 316–325 (2019). https://doi.org/10.18653/v1/W19-4433

  9. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: ACL-HLT, pp. 752–762 (2011)

    Google Scholar 

  10. Oka, H., Nguyen, H.T., Nguyen, C.T., Nakagawa, M., Ishioka, T.: Fully automated short answer scoring of the trial tests for common entrance examinations for Japanese university. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds.) AIED 2022. LNCS, vol. 13355, pp. 180–192. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-11644-5_15

    Chapter  Google Scholar 

  11. Riordan, B., Horbach, A., Cahill, A., Zesch, T., Lee, C.M.: Investigating neural architectures for short answer scoring. In: BEA, pp. 159–168 (2017). https://doi.org/10.18653/v1/W17-5017

  12. Saha, S., Dhamecha, T.I., Marvaniya, S., Foltz, P., Sindhgatta, R., Sengupta, B.: Joint multi-domain learning for automatic short answer grading. CoRR abs/1902.09183 (2019)

    Google Scholar 

  13. Sakaguchi, K., Heilman, M., Madnani, N.: Effective feature integration for automated short answer scoring. In: NAACL-HLT, Denver, Colorado, pp. 1049–1054. Association for Computational Linguistics (2015). https://doi.org/10.3115/v1/N15-1111

  14. Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: NAACL-HLT, San Diego, California, pp. 1070–1075. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/N16-1123

  15. Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training BERT on domain resources for short answer grading. In: EMNLP-IJCNLP, Hong Kong, China, pp. 6071–6075. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1628

  16. Wang, T., Funayama, H., Ouchi, H., Inui, K.: Data augmentation by rubrics for short answer grading. J. Nat. Lang. Process. 28(1), 183–205 (2021). https://doi.org/10.5715/jnlp.28.183

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to Dr. Paul Reisert for their writing and editing assistance. This work was supported by JSPS KAKENHI Grant Number 22H00524, JP19K12112, JST SPRING, Grant Number JPMJSP2114. We also thank Takamiya Gakuen Yoyogi Seminar for providing invaluable data useful for our experiments. We would like to thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroaki Funayama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Funayama, H., Asazuma, Y., Matsubayashi, Y., Mizumoto, T., Inui, K. (2023). Reducing the Cost: Cross-Prompt Pre-finetuning for Short Answer Scoring. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2023. Lecture Notes in Computer Science(), vol 13916. Springer, Cham. https://doi.org/10.1007/978-3-031-36272-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36272-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36271-2

  • Online ISBN: 978-3-031-36272-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics