Automated Scoring for Reading Comprehension via In-context BERT Tuning

Fernandez, Nigel; Ghosh, Aritra; Liu, Naiming; Wang, Zichao; Choffin, Benoît; Baraniuk, Richard; Lan, Andrew

doi:10.1007/978-3-031-11644-5_69

Nigel Fernandez¹¹,
Aritra Ghosh¹¹,
Naiming Liu¹²,
Zichao Wang¹²,
Benoît Choffin¹³,
Richard Baraniuk¹² &
…
Andrew Lan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

4764 Accesses
7 Altmetric

Abstract

Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring leverage textual representations from pre-trained language models like BERT. Existing approaches train a separate model for each item/question, suitable for scenarios like essay scoring where items can be different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item is difficult with large language models. We report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully designed input structure to provide contextual information on each item. Our experiments demonstrate the effectiveness of our approach which outperforms existing methods. We also perform a qualitative analysis and discuss the limitations of our approach. (Full version of the paper can be found at: https://arxiv.org/abs/2205.09864 Our implementation can be found at: https://github.com/ni9elf/automated-scoring)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench

Article 01 April 2024

Leveraging Large Language Models for Automated Chinese Essay Scoring

Can LLMs Grade Open Response Reading Comprehension Questions? An Empirical Study Using the ROARs Dataset

Article 18 October 2024

Notes

1.
Ran by the US Dept. of Education: https://github.com/NAEP-AS-Challenge/info.

References

The hewlett foundation: Automated essay scoring. Online: https://www.kaggle.com/c/asap-aes. Accessed May 2022
Attali, Y., Burstein, J.: Automated essay scoring with e-rater®. J. Technol. Learn. Assess. 4(3) (2006)
Google Scholar
Baral, S., Botelho, A.F., Erickson, J.A., Benachamardi, P., Heffernan, N.T.: Improving automated scoring of student open responses in mathematics. In: 14th International Conference on Educational Data Mining (EDM) (2021)
Google Scholar
Chen, Y., Zhong, R., Zha, S., Karypis, G., He, H.: Meta-learning via language model in-context tuning. In: 60th Annual Meeting of the Association for Computational Linguistics (ACL) (2022)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Meth. Instrum. Comput. 36(2), 193–202 (2004)
Article Google Scholar
Lottridge, S., Godek, B., Jafari, A., Patel, M.: Comparing the robustness of deep learning and classical automated scoring approaches to gaming strategies. Cambium Assessment Inc, Technical report (2021)
Google Scholar
Mayfield, E., Black, A.W.: Should you fine-tune BERT for automated essay scoring? In: 15th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 151–162 (2020)
Google Scholar
McNamara, D.S., Crossley, S.A., Roscoe, R.D., Allen, L.K., Dai, J.: A hierarchical classification approach to automated essay scoring. Assess. Writ. 23, 35–59 (2015)
Article Google Scholar
Min, S., Lewis, M., Zettlemoyer, L., Hajishirzi, H.: MetaICL: learning to learn in context. In: NAACL-HLT (2022)
Google Scholar
Persing, I., Ng, V.: Modeling prompt adherence in student essays. In: 52nd Conference of the Association for Computational Linguistics (ACL) (2014)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog (2019)
Google Scholar
Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training BERT on domain resources for short answer grading. In: 24th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6071–6075 (2019)
Google Scholar
Uto, M., Xie, Y., Ueno, M.: Neural automated essay scoring incorporating handcrafted features. In: 28th Conference on Computational Linguistics (COLING), pp. 6077–6088 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Massachusetts Amherst, Amherst, MA, 01003, USA
Nigel Fernandez, Aritra Ghosh & Andrew Lan
Rice University, Houston, TX, 77005, USA
Naiming Liu, Zichao Wang & Richard Baraniuk
Paris, France
Benoît Choffin

Authors

Nigel Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Aritra Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Naiming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zichao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Choffin
View author publications
You can also search for this author in PubMed Google Scholar
Richard Baraniuk
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Lan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Lan .

Editor information

Editors and Affiliations

Ateneo De Manila University, Quezon, Philippines
Maria Mercedes Rodrigo
Department of Computer Science, North Carolina State University, Raleigh, NC, USA
Noburu Matsuda
Durham University, Durham, UK
Alexandra I. Cristea
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernandez, N. et al. (2022). Automated Scoring for Reading Comprehension via In-context BERT Tuning. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_69

Download citation

DOI: https://doi.org/10.1007/978-3-031-11644-5_69
Published: 27 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench

Leveraging Large Language Models for Automated Chinese Essay Scoring

Can LLMs Grade Open Response Reading Comprehension Questions? An Empirical Study Using the ROARs Dataset

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench

Leveraging Large Language Models for Automated Chinese Essay Scoring

Can LLMs Grade Open Response Reading Comprehension Questions? An Empirical Study Using the ROARs Dataset

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation