Fully Automated Short Answer Scoring of the Trial Tests for Common Entrance Examinations for Japanese University

Oka, Haruki; Nguyen, Hung Tuan; Nguyen, Cuong Tuan; Nakagawa, Masaki; Ishioka, Tsunenori

doi:10.1007/978-3-031-11644-5_15

Haruki Oka¹¹,
Hung Tuan Nguyen¹²,
Cuong Tuan Nguyen¹²,
Masaki Nakagawa¹² &
…
Tsunenori Ishioka¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3782 Accesses
1 Citations
5 Altmetric

Abstract

Studies on automated short-answer scoring (SAS) have been conducted to apply natural language processing to education. Short-answer scoring is a task to grade the responses from linguistic information. Most answer sheets for short-answer questions are handwritten in an actual educational setting, which is a barrier to SAS. Therefore, we have developed a system that uses handwritten character recognition and natural language processing for fully automated scoring of handwritten responses to short-answer questions. This is the most extensive scoring data for responses to short-answer questions, and it may be the largest in the world. Applying the Cohen’s kappa coefficient to the graded evaluations, the results show 0.86 in the worst case, and approximately 0.95 is recorded for the remaining five question answers. We observe that the fully automated scoring system proposed in our study can also score with a high degree of accuracy comparable to that of human scoring.

H. Oka—Work done while at The University of Tokyo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/huggingface/transformers.

References

Alikaniotis, D., Yannakoudakis, H., Rei, M.: Automatic text scoring using neural networks. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 715–725. Association for Computational Linguistics, Berlin (2016). https://doi.org/10.18653/v1/P16-1068
Burstein, J., Tetreault, J., Madnani, N.: The e-rater automated essay scoring system. In: Shermis, M.D., Burstein, J. (eds.) Handbook of Automated Essay Evaluation, Chap. 4, pp. 55–67. Edwards Brothers Inc., New York (2013)
Google Scholar
Cohen, J.: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 7(4), 213–220 (1968)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423
Dong, F., Zhang, Y.: Automatic features for essay scoring - an empirical study. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1072–1077. Association for Computational Linguistics, Austin (2016). https://doi.org/10.18653/v1/D16-1115
Drid, T.: The fundamentals of assessing EFL writing. Psychol. Educ. Stud. 11(1), 292–305 (2018). https://doi.org/10.35156/1192-011-001-017
Foltz, P.W., Streeter, L.A., Lochbaum, K.E., Landauer, T.K.: Implementation and applications of the intelligent essay assessor. In: Shermis, M.D., Burstein, J. (eds.) Handbook of Automated Essay Evaluation, Chap. 5, pp. 55–67. Edwards Brothers Inc., New York (2013)
Google Scholar
Funayama, H., et al.: Preventing critical scoring errors in short answer scoring with confidence estimation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 237–243. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-srw.32
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jawahar, G., Sagot, B., Seddah, D.: What does BERT learn about the structure of language? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3651–3657. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1356
Li, Z., Tomar, Y., Passonneau, R.J.: A semantic feature-wise transformation relation network for automatic short answer grading. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6030–6040. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Google Scholar
Lottridge, S., Wood, S., Shaw, D.: The effectiveness of machine score-ability ratings in predicting automated scoring performance. Appl. Measur. Educ. 31(3), 215–232 (2018)
Article Google Scholar
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochem. Med. (Zagreb) 22(3), 276–282 (2012)
Article MathSciNet Google Scholar
Mizumoto, T., et al.: Analytic score prediction and justification identification in automated short answer scoring. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 316–325 (2019)
Google Scholar
Nguyen, H.T., Ly, N.T., Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Attempts to recognize anomalously deformed kana in Japanese historical documents. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, pp. 31–36. HIP2017, Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3151509.3151514
Riordan, B., Horbach, A., Cahill, A., Zesch, T., Lee, C.M.: Investigating neural architectures for short answer scoring. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 159–168. Association for Computational Linguistics, Copenhagen (2017). https://doi.org/10.18653/v1/W17-5017
Saito, T., Yamada, H., Yamamoto, K.: On the database ETL9 of handprinted characters in JIS Chinese characters and its analysis. Trans. IECE Jpn. J68-D(4), 757–764 (1985)
Google Scholar
Schultz, M.T.: The intellimetric automated essay scoring engine - a review and an application to chinese essay scoring. In: Shermis, M.D., Burstein, J. (eds.) Handbook of Automated Essay Evaluation, Chap. 6, pp. 55–67. Edwards Brothers Inc, New York (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39
Uto, M., Okano, M.: Robust neural automated essay scoring using item response theory. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 549–561. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_44
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP20H04300 and JST A-STEP Grant Number JPMJTM20ML.

Author information

Authors and Affiliations

Recruit Co., Ltd., Tokyo, Japan
Haruki Oka
Tokyo University of Agriculture and Technology, Tokyo, Japan
Hung Tuan Nguyen, Cuong Tuan Nguyen & Masaki Nakagawa
The National Center for University Entrance Examinations, Tokyo, Japan
Tsunenori Ishioka

Authors

Haruki Oka
View author publications
You can also search for this author in PubMed Google Scholar
Hung Tuan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Cuong Tuan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar
Tsunenori Ishioka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haruki Oka .

Editor information

Editors and Affiliations

Ateneo De Manila University, Quezon, Philippines
Maria Mercedes Rodrigo
Department of Computer Science, North Carolina State University, Raleigh, NC, USA
Noburu Matsuda
Durham University, Durham, UK
Alexandra I. Cristea
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oka, H., Nguyen, H.T., Nguyen, C.T., Nakagawa, M., Ishioka, T. (2022). Fully Automated Short Answer Scoring of the Trial Tests for Common Entrance Examinations for Japanese University. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-11644-5_15
Published: 27 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fully Automated Short Answer Scoring of the Trial Tests for Common Entrance Examinations for Japanese University