Skip to main content

Automated Scoring for Reading Comprehension via In-context BERT Tuning

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

Abstract

Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring leverage textual representations from pre-trained language models like BERT. Existing approaches train a separate model for each item/question, suitable for scenarios like essay scoring where items can be different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item is difficult with large language models. We report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully designed input structure to provide contextual information on each item. Our experiments demonstrate the effectiveness of our approach which outperforms existing methods. We also perform a qualitative analysis and discuss the limitations of our approach. (Full version of the paper can be found at: https://arxiv.org/abs/2205.09864 Our implementation can be found at: https://github.com/ni9elf/automated-scoring)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Ran by the US Dept. of Education: https://github.com/NAEP-AS-Challenge/info.

References

  1. The hewlett foundation: Automated essay scoring. Online: https://www.kaggle.com/c/asap-aes. Accessed May 2022

  2. Attali, Y., Burstein, J.: Automated essay scoring with e-rater®. J. Technol. Learn. Assess. 4(3) (2006)

    Google Scholar 

  3. Baral, S., Botelho, A.F., Erickson, J.A., Benachamardi, P., Heffernan, N.T.: Improving automated scoring of student open responses in mathematics. In: 14th International Conference on Educational Data Mining (EDM) (2021)

    Google Scholar 

  4. Chen, Y., Zhong, R., Zha, S., Karypis, G., He, H.: Meta-learning via language model in-context tuning. In: 60th Annual Meeting of the Association for Computational Linguistics (ACL) (2022)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  6. Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Meth. Instrum. Comput. 36(2), 193–202 (2004)

    Article  Google Scholar 

  7. Lottridge, S., Godek, B., Jafari, A., Patel, M.: Comparing the robustness of deep learning and classical automated scoring approaches to gaming strategies. Cambium Assessment Inc, Technical report (2021)

    Google Scholar 

  8. Mayfield, E., Black, A.W.: Should you fine-tune BERT for automated essay scoring? In: 15th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 151–162 (2020)

    Google Scholar 

  9. McNamara, D.S., Crossley, S.A., Roscoe, R.D., Allen, L.K., Dai, J.: A hierarchical classification approach to automated essay scoring. Assess. Writ. 23, 35–59 (2015)

    Article  Google Scholar 

  10. Min, S., Lewis, M., Zettlemoyer, L., Hajishirzi, H.: MetaICL: learning to learn in context. In: NAACL-HLT (2022)

    Google Scholar 

  11. Persing, I., Ng, V.: Modeling prompt adherence in student essays. In: 52nd Conference of the Association for Computational Linguistics (ACL) (2014)

    Google Scholar 

  12. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog (2019)

    Google Scholar 

  13. Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training BERT on domain resources for short answer grading. In: 24th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6071–6075 (2019)

    Google Scholar 

  14. Uto, M., Xie, Y., Ueno, M.: Neural automated essay scoring incorporating handcrafted features. In: 28th Conference on Computational Linguistics (COLING), pp. 6077–6088 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Lan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernandez, N. et al. (2022). Automated Scoring for Reading Comprehension via In-context BERT Tuning. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11644-5_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11643-8

  • Online ISBN: 978-3-031-11644-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics