Skip to main content

Generalizable Automatic Short Answer Scoring via Prototypical Neural Network

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13916))

Included in the following conference series:

  • 3140 Accesses

Abstract

We investigated the challenging task of generalizable automatic short answer scoring (ASAS), where a scoring model is tasked with generalizing to target domains (provided only with limited labeled data) that have no overlap with the auxiliary domains on which the model is trained. To address this, we introduced a framework based on Prototypical Neural Network (PNN). Specifically, for a target short answer instance whose score needs to be determined, the framework first calculates the distance between this target instance and each cluster of support instances (support instances are a set of labeled short answer instances that are grouped to different clusters according to their labels, i.e., the ground-truth scores). Then, it rates the target instance using the ground-truth score of the cluster that has the closest distance to the target instance. Through extensive empirical studies on an open-source ASAS dataset consisting of 10 different question prompts, we observed that the proposed approach consistently outperformed other baselines across settings concerning different numbers of support instances. We further observed that the proposed approach performed better when with wider training data sources than when with restricted data sources for training, showing that including more data sources for training may add to the generalizability of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All the experiments were completed on NVIDIA Tesla T4 GPU with 16GB RAM.

  2. 2.

    https://www.kaggle.com/competitions/asap-sas/data.

  3. 3.

    We follow [33] to define generalizability as the capacity of a model to predict over previously unseen domains.

  4. 4.

    This restriction also applies to META due to its own limitation (see Sect. 1).

  5. 5.

    https://github.com/huggingface/transformers.

References

  1. Alikaniotis, D., Yannakoudakis, H., Rei, M.: Automatic text scoring using neural networks. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 715–725 (2016)

    Google Scholar 

  2. Baral, S., Botelho, A.F., Erickson, J.A., Benachamardi, P., Heffernan, N.T.: Improving automated scoring of student open responses in mathematics. Int. Educ. Data Min. Soc. (2021)

    Google Scholar 

  3. Blanc, G., Rendle, S.: Adaptive sampled softmax with kernel based sampling. In: International Conference on Machine Learning, pp. 590–599. PMLR (2018)

    Google Scholar 

  4. Boney, R., Ilin, A., et al.: Active one-shot learning with prototypical networks. In: ESANN (2019)

    Google Scholar 

  5. Condor, A., Litster, M., Pardos, Z.: Automatic short answer grading with sbert on out-of-sample questions. In: Proceedings of the 14th International Conference on Educational Data Mining (2021)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  7. Dong, N., Xing, E.P.: Few-shot semantic segmentation with prototype learning. In: BMVC, vol. 3 (2018)

    Google Scholar 

  8. Dronen, N., Foltz, P.W., Habermehl, K.: Effective sampling for large-scale automated writing evaluation systems. In: Proceedings of the Second (2015) ACM Conference on Learning@ Scale, pp. 3–10 (2015)

    Google Scholar 

  9. Fazal, A., Dillon, T., Chang, E.: Noise reduction in essay datasets for automated essay grading. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2011. LNCS, vol. 7046, pp. 484–493. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25126-9_60

    Chapter  Google Scholar 

  10. Geng, R., Li, B., Li, Y., Zhu, X., Jian, P., Sun, J.: Induction networks for few-shot text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3904–3913 (2019)

    Google Scholar 

  11. Jakubik, J., Blumenstiel, B., Voessing, M., Hemmer, P.: Instance selection mechanisms for human-in-the-loop systems in few-shot learning. 6 (2022)

    Google Scholar 

  12. Jiang, Z., Liu, M., Yin, Y., Yu, H., Cheng, Z., Gu, Q.: Learning from graph propagation via ordinal distillation for one-shot automated essay scoring. In: Proceedings of the Web Conference 2021, pp. 2347–2356 (2021)

    Google Scholar 

  13. Jin, C., He, B., Hui, K., Sun, L.: TDNN: a two-stage deep neural network for prompt-independent automated essay scoring. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1088–1097 (2018)

    Google Scholar 

  14. Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Advances in Ranking NIPS 09 Workshop (2009)

    Google Scholar 

  15. Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)

    Article  Google Scholar 

  16. Li, O., Liu, H., Chen, C., Rudin, C.: Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  17. Nau, J., Haendchen Filho, A., Passero, G.: Evaluating semantic analysis methods for short answer grading using linear regression. Sciences 3(2), 437–450 (2017)

    Google Scholar 

  18. Pappano, L.: The year of the MOOC. N. Y. Times 2(12), 2012 (2012)

    Google Scholar 

  19. Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)

    Google Scholar 

  20. Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 452–461 (2009)

    Google Scholar 

  21. Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring. arXiv preprint arXiv:2008.01441 (2020)

  22. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  23. Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1070–1075 (2016)

    Google Scholar 

  24. Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training bert on domain resources for short answer grading. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6071–6075 (2019)

    Google Scholar 

  25. Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39

    Chapter  Google Scholar 

  26. Surya, K., Gayakwad, E., Nallakaruppan, M.: Deep learning for short answer scoring. Int. J. Recent Technol. Eng. 7(6), 1712–1715 (2019)

    Google Scholar 

  27. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29 (2016)

    Google Scholar 

  28. Wind, S.A., Peterson, M.E.: A systematic review of methods for evaluating rating quality in language assessment. Lang. Test. 35(2), 161–192 (2018)

    Article  Google Scholar 

  29. Xia, L., Guan, M., Liu, J., Cao, X., Luo, D.: Attention-based bidirectional long short-term memory neural network for short answer scoring. In: Guan, M., Na, Z. (eds.) MLICOM 2020. LNICST, vol. 342, pp. 104–112. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-66785-6_12

    Chapter  Google Scholar 

  30. Zeng, Z., Li, X., Gasevic, D., Chen, G.: Do deep neural nets display human-like attention in short answer scoring? In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 191–205 (2022)

    Google Scholar 

  31. Zeng, Z., Lin, J., Li, L., Pan, W., Ming, Z.: Next-item recommendation via collaborative filtering with bidirectional item similarity. ACM Trans. Inf. Syst. (TOIS) 38(1), 1–22 (2019)

    Article  Google Scholar 

  32. Zesch, T., Heilman, M., Cahill, A.: Reducing annotation efforts in supervised short answer scoring. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 124–132 (2015)

    Google Scholar 

  33. Zhang, M., Baral, S., Heffernan, N., Lan, A.: Automatic short math answer grading via in-context meta-learning. In: Proceedings of the 15th International Conference on Educational Data Mining (2022)

    Google Scholar 

  34. Zhu, Z., Wang, J., Caverlee, J.: Measuring and mitigating item under-recommendation bias in personalized ranking systems. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449–458 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanliang Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, Z., Li, L., Guan, Q., Gašević, D., Chen, G. (2023). Generalizable Automatic Short Answer Scoring via Prototypical Neural Network. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2023. Lecture Notes in Computer Science(), vol 13916. Springer, Cham. https://doi.org/10.1007/978-3-031-36272-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36272-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36271-2

  • Online ISBN: 978-3-031-36272-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics