Modeling essay grading with pre-trained BERT features

Sharma, Annapurna; Jayagopi, Dinesh Babu

doi:10.1007/s10489-024-05410-4

Modeling essay grading with pre-trained BERT features

Published: 11 April 2024

Volume 54, pages 4979–4993, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

479 Accesses
2 Altmetric
Explore all metrics

Abstract

Writing essays is an important skill which enables one to clearly write the ideas and understanding of certain topic with the help of language articulation and examples. Writing essay is a skill so is the grading of those essays. It requires a lot of efforts to grade these essays and the task becomes tedious and repetitive when the student to teacher ratio is high. As with any other repetitive task, the intervention of technology for automated essay grading has been thought of long back. However, the main challenge in automated essay grading lies in the understanding of language construction, word usage and presentation of idea/ argument/ narration. Language complexity makes natural language understanding a challenging task. In this work, we show our experiments with pre-trained static word embeddings like GloVe, fastText and pre-trained contextual model Bidirectional Encoder Representations from Transformers (BERT) for the task of automated essay grading. For the regression task, we have used Long Short-Term Memory (LSTM) and Support Vector Regression (SVR) models under various feature settings framed from the learnt embeddings. The results are shown with the ASAP-AES dataset on all 8 prompts. Our work shows average Quadratic Weighted Kappa (QWK) of 0.81 and 0.71 with SVR and LSTM on in-domain test set essays, respectively. The SVR model shows a better QWK than the human-human agreement of 0.75. To the best of our knowledge, our SVR model with pre-trained BERT embeddings achieve the highest average QWK reported on ASAP-AES data set. We further show the performance of our approach with adversary samples generated using permuted essays and off-topic essays. We experimentally show that our LSTM model though does not show high QWK score with human assigned grade but is robust against the adversarial settings considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Label Embedding Based Scoring Method for Secondary School Essays

Coherence Based Automatic Essay Scoring Using Sentence Embedding and Recurrent Neural Networks

An Improved Approach for Automated Essay Scoring with LSTM and Word Embedding

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Chen H, He B (2013) Automated essay scoring by maximizing human-machine agreement. In:Proceedings Of The 2013 conference on empirical methods in natural language processing, pp 1741–1752
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is All You Need. arXiv:1706.03762
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In:Proceedings Of The 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532-1543
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv:1607.04606
Sharma A, Jayagopi D (2018) Automated grading of handwritten essays. In:2018 16th International conference on frontiers in handwriting recognition (ICFHR). pp 279–284
Sharma A, Jayagopi D (2018) Handwritten essay grading on mobiles using MDLSTM model and word embeddings. In:11th Indian conference on computer vision, graphics and image processing (ICVGIP)
Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Zhang H, Litman D (2018) Co-attention based neural network for source-dependent essay scoring. In:Proceedings Of the thirteenth workshop on innovative use of NLP For building educational applications, pp 399-409 (2018,6). https://www.aclweb.org/anthology/W18-0549
Liu J, Xu Y, Zhao L (2019) Automated essay scoring based on two-stage learning. arXiv:1901.07744
Jin C, He B, Hui K, Sun L (2018) TDNN: a two-stage deep neural network for prompt-independent automated essay scoring. In:Proceedings Of The 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 1088–1097 (2018,7). https://www.aclweb.org/anthology/P18-1100
Horbach A, Scholten-Akoun D, Ding Y, Zesch T (2017) Fine-grained essay scoring of a complex writing task for native speakers. In:Proceedings Of The 12th workshop on innovative use of NLP for building educational applications, pp 357–366 (2017,9). https://www.aclweb.org/anthology/W17-5040
Tay Y, Phan M, Tuan L, Hui S (2018) SkipFlow: Incorporating neural coherence features for end-to-end automatic text scoring. In:Thirty-second AAAI conference on artificial intelligence
Cozma M, Butnaru A, Ionescu R (2018) Automated essay scoring with string kernels and word embeddings. In:Proceedings Of The 56th annual meeting of the association for computational linguistics (vol 2: Short Papers), pp 503–509 (2018,7). https://www.aclweb.org/anthology/P18-2080
Zesch T, Wojatzki M, Scholten-Akoun D (2015) Task-independent features for automated essay grading. In:Proceedings of the tenth workshop on innovative use of NLP for building educational applications, pp 224–232
Mahana M, Johns M, Apte A (2012) Automated essay grading using machine learning. Session, Stanford University, Mach. Learn
Mikolov T, Grave E, Bojanowski, P, Puhrsch C, Joulin A (2018) Advances in pre-training distributed word representations. In:Proceedings Of the international conference on language resources and evaluation (LREC 2018)
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances In Neural Information Processing Systems. pp 3111–3119
Kingma D, Ba J (2015) Adam: A Method for Stochastic Optimization. In:3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings. arXiv:1412.6980
Hashimoto K, Xiong C, Tsuruoka Y, Socher R (2016) A joint many-task model: Growing a neural network for multiple nlp tasks. arXiv:1611.01587
Conneau A, Lample G, Ranzato M, Denoyer L, Jégou H (2017) Word translation without parallel data. arXiv:1710.04087
Yu L, Wang J, Lai K, Zhang X (2017) Refining word embeddings for sentiment analysis. In:Proceedings Of The 2017 conference on empirical methods in natural language processing. pp 534–539 (2017,9). https://www.aclweb.org/anthology/D17-1056
Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-based and neural unsupervised machine translation. arXiv:1804.07755
Ethayarajh K (2019) How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Proceedings Of The 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 55-65
Schuster M, Nakajima K (2012) Japanese and korean voice search. In:2012 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 5149-5152
Ben-Simon A, Bennett R (2007) Toward more substantively meaningful automated essay scoring. The Journal Of Technology, Learning And Assessment 6
Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V. 2. The Journal Of Technology, Learning And Assessment 4
Darling-Hammond L, Herman J, Pellegrino J, Abedi J, Aber J, Baker E, Bennett R, Gordon E, Haertel E, Hakuta K (2013) & Others Criteria for High-Quality Assessment. Stanford Center For Opportunity Policy In Education, Stanford, CA
Foltz P, Laham D, Landauer T (1999) The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal Of Computer-Enhanced Learning 1:939–944
Google Scholar
Malouff J, Thorsteinsson E (2016) Bias in grading: A meta-analysis of experimental research findings. Aust J Educ 60:245–256
Article Google Scholar
Norton L (1990) Essay-writing: what really counts. High Educ 20:411–442
Guan J, Yang Z, Zhang R, Hu Z, Huang M (2023) Generating coherent narratives by learning dynamic and discrete entity states with a contrastive framework. In: Proceedings of the AAAI conference on artificial intelligence. pp 12836–12844
Goyal T, Li JJ, Durrett G (2022) SNaC: coherence error detection for narrative summarization. In: Proceedings of the 2022 conference on empirical methods in natural language processing. pp 444–463
Sharma A, Katlaa R, Kaur G, Jayagopi DB (2023) Full-page hand-writing recognition and automated essay scoring for in-the-wild essays. Multimedia Tools and Applications, pp 1–24
McCarthy KS, Roscoe RD, Allen LK, Likens AD, McNamara DS (2022) Automated writing evaluation: Does spelling and grammar feedback support high-quality writing and revision. Assessing Writing 52:100608
Article Google Scholar
Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res Methods Appl Linguist 2(2):100050
Naismith B, Mulcaire P, Burstein J (2023) Automated evaluation of written discourse coherence using GPT-4. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pp 394–403
Ariely M, Nazaretsky T, Alexandron G (2023) Machine learning and Hebrew NLP for automated assessment of open-ended questions in biology. Int J Artif Intell Educ 33(1):1–34
Article Google Scholar
Chuang PL, Yan X (2022) An investigation of the relationship between argument structure and essay quality in assessed writing. J Second Lang Writ 56:100892
Article Google Scholar

Download references

Funding

Author Annapurna Sharma is supported by Visvesvaraya PhD Scheme, Ministry of Electronics and Information Technology (MeitY), Government of India under the grant number– MEITY-PHD-2541.

Author information

Authors and Affiliations

Zenlabs AI Research Lab, Zensar Technologies, Bangalore, 560100, Karnataka, India
Annapurna Sharma
Multimodal Perception Lab, International Institute of Information Technology, Bangalore, 560100, Karnataka, India
Dinesh Babu Jayagopi

Authors

Annapurna Sharma
View author publications
You can also search for this author inPubMed Google Scholar
Dinesh Babu Jayagopi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Annapurna Sharma.

Ethics declarations

Competing of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sharma, A., Jayagopi, D.B. Modeling essay grading with pre-trained BERT features. Appl Intell 54, 4979–4993 (2024). https://doi.org/10.1007/s10489-024-05410-4

Download citation

Accepted: 20 March 2024
Published: 11 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05410-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling essay grading with pre-trained BERT features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Label Embedding Based Scoring Method for Secondary School Essays

Coherence Based Automatic Essay Scoring Using Sentence Embedding and Recurrent Neural Networks

An Improved Approach for Automated Essay Scoring with LSTM and Word Embedding

Explore related subjects

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now