Abstract
Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring leverage textual representations from pre-trained language models like BERT. Existing approaches train a separate model for each item/question, suitable for scenarios like essay scoring where items can be different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item is difficult with large language models. We report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully designed input structure to provide contextual information on each item. Our experiments demonstrate the effectiveness of our approach which outperforms existing methods. We also perform a qualitative analysis and discuss the limitations of our approach. (Full version of the paper can be found at: https://arxiv.org/abs/2205.09864 Our implementation can be found at: https://github.com/ni9elf/automated-scoring)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Ran by the US Dept. of Education: https://github.com/NAEP-AS-Challenge/info.
References
The hewlett foundation: Automated essay scoring. Online: https://www.kaggle.com/c/asap-aes. Accessed May 2022
Attali, Y., Burstein, J.: Automated essay scoring with e-rater®. J. Technol. Learn. Assess. 4(3) (2006)
Baral, S., Botelho, A.F., Erickson, J.A., Benachamardi, P., Heffernan, N.T.: Improving automated scoring of student open responses in mathematics. In: 14th International Conference on Educational Data Mining (EDM) (2021)
Chen, Y., Zhong, R., Zha, S., Karypis, G., He, H.: Meta-learning via language model in-context tuning. In: 60th Annual Meeting of the Association for Computational Linguistics (ACL) (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Meth. Instrum. Comput. 36(2), 193–202 (2004)
Lottridge, S., Godek, B., Jafari, A., Patel, M.: Comparing the robustness of deep learning and classical automated scoring approaches to gaming strategies. Cambium Assessment Inc, Technical report (2021)
Mayfield, E., Black, A.W.: Should you fine-tune BERT for automated essay scoring? In: 15th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 151–162 (2020)
McNamara, D.S., Crossley, S.A., Roscoe, R.D., Allen, L.K., Dai, J.: A hierarchical classification approach to automated essay scoring. Assess. Writ. 23, 35–59 (2015)
Min, S., Lewis, M., Zettlemoyer, L., Hajishirzi, H.: MetaICL: learning to learn in context. In: NAACL-HLT (2022)
Persing, I., Ng, V.: Modeling prompt adherence in student essays. In: 52nd Conference of the Association for Computational Linguistics (ACL) (2014)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog (2019)
Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training BERT on domain resources for short answer grading. In: 24th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6071–6075 (2019)
Uto, M., Xie, Y., Ueno, M.: Neural automated essay scoring incorporating handcrafted features. In: 28th Conference on Computational Linguistics (COLING), pp. 6077–6088 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Fernandez, N. et al. (2022). Automated Scoring for Reading Comprehension via In-context BERT Tuning. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_69
Download citation
DOI: https://doi.org/10.1007/978-3-031-11644-5_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)