Generalizable Automatic Short Answer Scoring via Prototypical Neural Network

Zeng, Zijie; Li, Lin; Guan, Quanlong; Gašević, Dragan; Chen, Guanliang

doi:10.1007/978-3-031-36272-9_36

Zijie Zeng¹²,
Lin Li¹²,
Quanlong Guan¹³,
Dragan Gašević¹² &
…
Guanliang Chen¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13916))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3140 Accesses

Abstract

We investigated the challenging task of generalizable automatic short answer scoring (ASAS), where a scoring model is tasked with generalizing to target domains (provided only with limited labeled data) that have no overlap with the auxiliary domains on which the model is trained. To address this, we introduced a framework based on Prototypical Neural Network (PNN). Specifically, for a target short answer instance whose score needs to be determined, the framework first calculates the distance between this target instance and each cluster of support instances (support instances are a set of labeled short answer instances that are grouped to different clusters according to their labels, i.e., the ground-truth scores). Then, it rates the target instance using the ground-truth score of the cluster that has the closest distance to the target instance. Through extensive empirical studies on an open-source ASAS dataset consisting of 10 different question prompts, we observed that the proposed approach consistently outperformed other baselines across settings concerning different numbers of support instances. We further observed that the proposed approach performed better when with wider training data sources than when with restricted data sources for training, showing that including more data sources for training may add to the generalizability of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All the experiments were completed on NVIDIA Tesla T4 GPU with 16GB RAM.
2.
https://www.kaggle.com/competitions/asap-sas/data.
3.
We follow [33] to define generalizability as the capacity of a model to predict over previously unseen domains.
4.
This restriction also applies to META due to its own limitation (see Sect. 1).
5.
https://github.com/huggingface/transformers.

References

Alikaniotis, D., Yannakoudakis, H., Rei, M.: Automatic text scoring using neural networks. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 715–725 (2016)
Google Scholar
Baral, S., Botelho, A.F., Erickson, J.A., Benachamardi, P., Heffernan, N.T.: Improving automated scoring of student open responses in mathematics. Int. Educ. Data Min. Soc. (2021)
Google Scholar
Blanc, G., Rendle, S.: Adaptive sampled softmax with kernel based sampling. In: International Conference on Machine Learning, pp. 590–599. PMLR (2018)
Google Scholar
Boney, R., Ilin, A., et al.: Active one-shot learning with prototypical networks. In: ESANN (2019)
Google Scholar
Condor, A., Litster, M., Pardos, Z.: Automatic short answer grading with sbert on out-of-sample questions. In: Proceedings of the 14th International Conference on Educational Data Mining (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Dong, N., Xing, E.P.: Few-shot semantic segmentation with prototype learning. In: BMVC, vol. 3 (2018)
Google Scholar
Dronen, N., Foltz, P.W., Habermehl, K.: Effective sampling for large-scale automated writing evaluation systems. In: Proceedings of the Second (2015) ACM Conference on Learning@ Scale, pp. 3–10 (2015)
Google Scholar
Fazal, A., Dillon, T., Chang, E.: Noise reduction in essay datasets for automated essay grading. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2011. LNCS, vol. 7046, pp. 484–493. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25126-9_60
Chapter Google Scholar
Geng, R., Li, B., Li, Y., Zhu, X., Jian, P., Sun, J.: Induction networks for few-shot text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3904–3913 (2019)
Google Scholar
Jakubik, J., Blumenstiel, B., Voessing, M., Hemmer, P.: Instance selection mechanisms for human-in-the-loop systems in few-shot learning. 6 (2022)
Google Scholar
Jiang, Z., Liu, M., Yin, Y., Yu, H., Cheng, Z., Gu, Q.: Learning from graph propagation via ordinal distillation for one-shot automated essay scoring. In: Proceedings of the Web Conference 2021, pp. 2347–2356 (2021)
Google Scholar
Jin, C., He, B., Hui, K., Sun, L.: TDNN: a two-stage deep neural network for prompt-independent automated essay scoring. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1088–1097 (2018)
Google Scholar
Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Advances in Ranking NIPS 09 Workshop (2009)
Google Scholar
Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)
Article Google Scholar
Li, O., Liu, H., Chen, C., Rudin, C.: Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Nau, J., Haendchen Filho, A., Passero, G.: Evaluating semantic analysis methods for short answer grading using linear regression. Sciences 3(2), 437–450 (2017)
Google Scholar
Pappano, L.: The year of the MOOC. N. Y. Times 2(12), 2012 (2012)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)
Google Scholar
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 452–461 (2009)
Google Scholar
Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring. arXiv preprint arXiv:2008.01441 (2020)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1070–1075 (2016)
Google Scholar
Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training bert on domain resources for short answer grading. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6071–6075 (2019)
Google Scholar
Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39
Chapter Google Scholar
Surya, K., Gayakwad, E., Nallakaruppan, M.: Deep learning for short answer scoring. Int. J. Recent Technol. Eng. 7(6), 1712–1715 (2019)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Wind, S.A., Peterson, M.E.: A systematic review of methods for evaluating rating quality in language assessment. Lang. Test. 35(2), 161–192 (2018)
Article Google Scholar
Xia, L., Guan, M., Liu, J., Cao, X., Luo, D.: Attention-based bidirectional long short-term memory neural network for short answer scoring. In: Guan, M., Na, Z. (eds.) MLICOM 2020. LNICST, vol. 342, pp. 104–112. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-66785-6_12
Chapter Google Scholar
Zeng, Z., Li, X., Gasevic, D., Chen, G.: Do deep neural nets display human-like attention in short answer scoring? In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 191–205 (2022)
Google Scholar
Zeng, Z., Lin, J., Li, L., Pan, W., Ming, Z.: Next-item recommendation via collaborative filtering with bidirectional item similarity. ACM Trans. Inf. Syst. (TOIS) 38(1), 1–22 (2019)
Article Google Scholar
Zesch, T., Heilman, M., Cahill, A.: Reducing annotation efforts in supervised short answer scoring. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 124–132 (2015)
Google Scholar
Zhang, M., Baral, S., Heffernan, N., Lan, A.: Automatic short math answer grading via in-context meta-learning. In: Proceedings of the 15th International Conference on Educational Data Mining (2022)
Google Scholar
Zhu, Z., Wang, J., Caverlee, J.: Measuring and mitigating item under-recommendation bias in personalized ranking systems. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449–458 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Learning Analytics, Monash University, Melbourne, Australia
Zijie Zeng, Lin Li, Dragan Gašević & Guanliang Chen
Jinan University, Guangzhou, China
Quanlong Guan

Authors

Zijie Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Quanlong Guan
View author publications
You can also search for this author in PubMed Google Scholar
Dragan Gašević
View author publications
You can also search for this author in PubMed Google Scholar
Guanliang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanliang Chen .

Editor information

Editors and Affiliations

University of Southern California, Los Angeles, CA, USA
Ning Wang
University of British Columbia, Vancouver, BC, Canada
Genaro Rebolledo-Mendez
North Carolina State University, Raleigh, NC, USA
Noboru Matsuda
Despacho 3.01, UNED-Grupo de Investigación aDeNu, Madrid, Spain
Olga C. Santos
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, Z., Li, L., Guan, Q., Gašević, D., Chen, G. (2023). Generalizable Automatic Short Answer Scoring via Prototypical Neural Network. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2023. Lecture Notes in Computer Science(), vol 13916. Springer, Cham. https://doi.org/10.1007/978-3-031-36272-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-36272-9_36
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36271-2
Online ISBN: 978-3-031-36272-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generalizable Automatic Short Answer Scoring via Prototypical Neural Network