Skip to main content

Two Experiments for Automatic Scoring of Handwritten Descriptive Answers

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2024)

Abstract

This paper presents our motivation, design and two experiments for automatic scoring of handwritten descriptive answers. The first experiment is on scoring of handwritten short descriptive answers in Japanese language exams. We used a deep neural network (DNN)-based handwriting recognizer and a transformer-based automatic scorer without correcting misrecognized characters or adding rubric annotations for scoring. We achieved acceptable agreement between the automatic scoring and the human scoring, while using only 1.7% of the human-scored answers for training. The second experiment is to score descriptive answers written on electronic paper for Japanese, English, and math drills. We used DNN-based online and offline handwriting recognizers for each subject and took simple perfect matching of recognized candidates with correct answers. The experiment shows that the False Negative rate is reduced by combining the online and offline recognizers and the False Positive rate is reduced by rejecting low recognition scores. Even with the current system, human scorers only need to manually score less than 30% of the answers, with false positive (risky) scores of about 2% or less for the three subjects.

C. T. Nguyen—Work done while at Tokyo University of Agriculture and Technology.

H. Oka—Work done while at The University of Tokyo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Heffernan, N.T., Heffernan, C.L.: The ASSISTments ecosystem: building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. Int. J. Artif. Intell. Educ. 24, 470–497 (2014)

    Article  Google Scholar 

  2. The Central Council of Education, J.: 177th Report (in Japanese)

    Google Scholar 

  3. Plamondon, R., Pirlo, G., Anquetil, É., Rémi, C., Teulings, H.L., Nakagawa, M.: Personal digital bodyguards for e-security, e-learning and e-health: a prospective survey. Pattern Recognit. 81, 633–659 (2018)

    Article  Google Scholar 

  4. Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25, 60–117 (2015)

    Article  Google Scholar 

  5. Burstein, J., et al.: Automated scoring using a hybrid feature identification technique. In: 36th ACL and 17th COLING, Quebec, Canada, pp. 206–210 (1998)

    Google Scholar 

  6. Wild, F., Stahl, C., Stermsek, G., Neumann, G.: Parameters driving effectiveness of automated essay scoring with LSA. In: 9th Conference on Computer Assisted Assessment, Loughborough, England, pp. 485–494 (2005)

    Google Scholar 

  7. Ishioka, T., Kameda, M.: Automated Japanese essay scoring system:jess. In: Proceedings of International Workshop on Database Expert System Applications, pp. 4–8. IEEE (2004)

    Google Scholar 

  8. Srihari, S., Srihari, R., Babu, P., Srinivasan, H.: On the automatic scoring of handwritten essays. In: 20th International Joint Conference on Artificial Intelligence, pp. 2880–2884 (2007)

    Google Scholar 

  9. Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Hum. 37, 389–405 (2003)

    Article  Google Scholar 

  10. Pulman, S.G., Sukkarieh, J.Z.: Automatic short answer marking. In: 2th Workshop on Building Educational Applications Using NLP, Michigan, USA, pp. 9–16 (2005)

    Google Scholar 

  11. Mitchell, T., Aldridge, N., Broomhead, P.: Computerised marking of short-answer free-text responses. In: 29th annual conference of the International Association for Educational Assessment, Manchester, UK, pp. 1–16 (2003)

    Google Scholar 

  12. Dzikovska, M.O., Nielsen, R.D., Brew, C.: Towards effective tutorial feedback for explanation questions: a dataset and baselines. In: 2012 NAACL: Human Language Technologies, Montréal, Canada, pp. 200–210 (2012)

    Google Scholar 

  13. Dzikovska, M.O., et al.: SemEval-2013 task 7: the joint student response analysis and 8th recognizing textual entailment challenge. In: 2nd Joint Conference on Lexical and Computational Semantics, Atlanta, USA, pp. 263–274 (2013)

    Google Scholar 

  14. Kaggle: Kaggle. http://www.kaggle.com/c/asap-aes. Accessed 25 Dec 2023

  15. Taghipour, K., Ng, H.T.: A neural approach to automated essay scoring. In: EMNLP 2016, Austin, USA, pp. 1882–1891 (2016)

    Google Scholar 

  16. Dong, F., Zhang, Y.: Automatic features for essay scoring - An empirical study. In: NMNLP 2016, Austin, USA, pp. 1072–1077 (2016)

    Google Scholar 

  17. Alikaniotis, D., Yannakoudakis, H., Rei, M.: Automatic text scoring using neural networks. In: 54th ACL, Berlin, Germany, pp. 715–725 (2016)

    Google Scholar 

  18. Zhao, S., Zhang, Y., Xiong, X., Botelho, A., Heffernan, N.: A memory-augmented neural model for automated grading. In: 4th ACM Conference on Learning at Scale, Cambridge, USA, pp. 189–192 (2017)

    Google Scholar 

  19. Riordan, B., Horbach, A., Cahill, A., Zesch, T., Min Lee, C.: Investigating neural architectures for short answer scoring. In: 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, Denmark, pp. 159–168 (2017)

    Google Scholar 

  20. Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS, vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39

    Chapter  Google Scholar 

  21. Camus, L., Filighera, A.: Investigating transformers for automatic short answer grading. In: Bittencourt, I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS, vol. 12164, pp. 43–48. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_8

    Chapter  Google Scholar 

  22. Lun, J., Zhu, J., Tang, Y., Yang, M.: Multiple data augmentation strategies for improving performance on automatic short answer scoring. In: 34th AAAI, New York, USA, pp. 13446–13453 (2020)

    Google Scholar 

  23. Li, Z., Tomar, Y., Passonneau, R.J.: A semantic feature-wise transformation relation network for automatic short answer grading. In: EMNLP 2021, Punta Cana, Dominican Republic, pp. 6030–6040 (2021)

    Google Scholar 

  24. Mizumoto, T., Ouchi, H., Isobe, Y., Reisert, P., Nagata, R., Sekine, S., Inui, K.: Analytic score prediction and justification identification in automated short answer scoring. In: 14th Workshop on Innovative Use of NLP for Building Educational Applications, Florence, Italy, pp. 316–325 (2019)

    Google Scholar 

  25. Funayama, H., Sasaki, S., Matsubayashi, Y., Mizumoto, T., Suzuki, J., Mita, M., Inui, K.: Preventing critical scoring errors in short answer scoring with confidence estimation. In: 58th ACL: Student Research Workshop, pp. 237–243 (2020)

    Google Scholar 

  26. Takano, S., Ichikawa, O.: Automatic scoring of short answers using justification cues estimated by BERT. In: 17th Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, USA, pp. 8–13 (2022)

    Google Scholar 

  27. Informatics Research Data Repository, N.I. of informatics: RIKEN: RIKEN Dataset for Short Answer Assessment (2020)

    Google Scholar 

  28. Proceedings of the First International Workshop on Pen-Based Learning Technologies, PLT 2007. Catania, Italy (2007). https://doi.org/10.5555/1338440

  29. Koile, K., et al.: Supporting pen-based classroom interaction: new findings and functionality for classroom learning partner. In: 1st International Workshop on Pen-Based Learning Technologies, pp. 1–7. Catania, Italy (2007)

    Google Scholar 

  30. Nakagawa, M., Lozano, N., Oda, H.: Paper architecture and an exam scoring application. In: 1st International Workshop on Pen-Based Learning Technologies, Catania, Italy, pp. 1–6 (2007)

    Google Scholar 

  31. Lozano, N., Hirosawa, K., Nakagawa, M.: A scoring tool for electronic paper exams. In: 7th IEEE International Conference on Advanced Learning Technologies, Niigata, Japan, pp. 120–121 (2007)

    Google Scholar 

  32. Prey, J., Reed, R.H., Berque, D.A.: The Impact of Tablet PCs and Pen-Based Technology on Education 2007: Beyond the Tipping Point. Purdue University Press (2007)

    Google Scholar 

  33. Yoshida, N., Koyama, K., Ng, K., Tsukahara, W., Nakagawa, M.: New features for a pen and paper-based exam scripts marking system. In: E-Learn 2009, Vancouver, Canada, pp. 3758–3765 (2009)

    Google Scholar 

  34. Koyama, K., Nakagawa, M.: Implementation of a pen and paper based exam marking system. In: E-Learn 2010, Orlando, Florida, pp. 1073–1078 (2010)

    Google Scholar 

  35. Khuong, V.T.M., Minh Khanh, P.Q., Huy, U.C., Tuan, N., Nakagawa, M.: A synthetic dataset for clustering handwritten math expression TUAT (Dset_Mix). https://tc11.cvc.uab.es/datasets/Dset_Mix_1. Accessed 25 Dec 2023

  36. Khuong, V.T.M., Phan, K.M., Ung, H.Q., Nguyen, C.T., Nakagawa, M.: Clustering of handwritten mathematical expressions for computer-assisted marking. IEICE Trans. Inf. Syst. E104D, 275–284 (2021)

    Article  Google Scholar 

  37. Mouchère, H., et al.: ICFHR 2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China, pp. 607–612 (2016)

    Google Scholar 

  38. Mahdavi, M., Zanibbi, R., Mouchère, H.: ICDAR 2019 CROHME + TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 15th ICDAR, Sydney, Australia, pp. 1533–1538 (2019)

    Google Scholar 

  39. Nguyen, C.T., Khuong, V.T.M., Nguyen, H.T., Nakagawa, M.: CNN based spatial classification features for clustering offline handwritten mathematical expressions. Pattern Recognit. Lett. 131, 113–120 (2020)

    Article  Google Scholar 

  40. Zhu, Y., Xie, Z., Jin, L., Chen, X., Huang, Y., Zhang, M.: SCUT-EPT: new dataset and benchmark for offline Chinese text recognition in examination paper. IEEE Access. 7, 370–382 (2019)

    Article  Google Scholar 

  41. MathNet. https://www.etrialstestbed.org/projects/mathnet-competition

  42. Oka, H., Nguyen, H.T., Nguyen, C.T., Nakagawa, M., Ishioka, T.: Fully automated short answer scoring of the trial tests for common entrance examinations for Japanese university. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds.) AIED 2022. LNCS, vol. 13355, pp. 180–192. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-11644-5_15

    Chapter  Google Scholar 

  43. Nguyen, H.T., Nguyen, C.T., Oka, H., Ishioka, T., Nakagawa, M.: Handwriting recognition and automatic scoring for descriptive answers in Japanese language tests. In: Porwal, U., Fornés, A., Shafait, F. (eds.) ICFHR 2022. LNCS, vol. 13639, pp. 274–284. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21648-0_19

    Chapter  Google Scholar 

  44. Nguyen, H.T., Ly, N.T., Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Attempts to recognize anomalously deformed Kana in Japanese historical documents. In: 4th International Workshop on Historical Document Imaging and Processing, New York, USA, pp. 31–36 (2017)

    Google Scholar 

  45. Saito, T., Yamada, H., Yamamoto, K.: On the database ETL 9 of handprinted characters in HIS Chinese characters and its analysis. Trans. IECE Jpn. J68-D(4), 757–764 (1986)

    Google Scholar 

  46. Devlin, J., Chang, M.-W.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, pp. 4171–4186 (2019)

    Google Scholar 

  47. Cohen, J.: Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213–220 (1968)

    Article  Google Scholar 

  48. Asakura, T., et al.: Digitalizing educational workbooks and collecting handwritten answers for automatic scoring. In: 5th Workshop on Intelligent Textbooks, Tokyo, Japan, pp. 78–87 (2023)

    Google Scholar 

  49. Nguyen, H.T., Nguyen, C.T., Nakagawa, M.: Online Japanese handwriting recognizers using recurrent neural networks. In: 16th International Conference on Frontiers in Handwriting Recognition, Niagara Falls, USA, pp. 435–440 (2018)

    Google Scholar 

  50. Ly, N.T., Nguyen, H.T., Nakagawa, M.: 2D self-attention convolutional recurrent network for offline handwritten text recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS (LNAI and LNB), vol. 12821, pp. 191–204. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_13

    Chapter  Google Scholar 

  51. Nguyen, C.T., Nakagawa, M.: Finite state machine based decoding of handwritten text using recurrent neural networks. In: 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China, pp. 246–251 (2016)

    Google Scholar 

  52. Nguyen, C.T., Truong, T.N., Nguyen, H.T., Nakagawa, M.: Global context for improving recognition of online handwritten mathematical expressions. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol 12822, pp. 617–631. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_40

  53. Truong, T.-N., Nguyen, C.T., Nakagawa, M.: Syntactic data generation for handwritten mathematical expression recognition. Pattern Recognit. Lett. 153, 83–91 (2021)

    Article  Google Scholar 

  54. Matsushita, T., Nakagawa, M.: A database of on-line handwritten mixed objects named “Kondate”. In: 14th International Conference on Frontiers in Handwriting Recognition, Hersonissos, Greece, pp. 369–374 (2014)

    Google Scholar 

  55. Liwicki, M., Bunke, H.: IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard. In: 2005 8th International Conference on Document Analysis and Recognition, pp. 956–961 (2005)

    Google Scholar 

  56. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5, 39–46 (2003). https://doi.org/10.1007/s100320200071

    Article  Google Scholar 

Download references

Acknowledgement

This work is partially being supported by the joint research budget from WACOM Co., Ltd. and KAKENHI JP24H00738, JP23H03511, JP22H00085, JP21K18136.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaki Nakagawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakagawa, M. et al. (2024). Two Experiments for Automatic Scoring of Handwritten Descriptive Answers. In: Sfikas, G., Retsinas, G. (eds) Document Analysis Systems. DAS 2024. Lecture Notes in Computer Science, vol 14994. Springer, Cham. https://doi.org/10.1007/978-3-031-70442-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70442-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70441-3

  • Online ISBN: 978-3-031-70442-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics