Abstract
Students learning Health Informatics in the degree course of Medicine and Surgery of the University of L’Aquila (Italy) are required – to pass the exam – to submit solutions to assignments concerning the execution and interpretation of statistical analyses. The paper presents a tool for the automated grading of such a kind of solutions, where the statistical analyses are made up R commands and outputs, and the interpretations are short text answers. The tool performs a static analysis of the R commands with the respective output, and uses Natural Language Processing techniques for the short text answers. The paper summarises the solution regarding the R commands and output, and delves into the method and the results used for the automated classification of the short text answers. In particular, we show that through FastText sentence embeddings and a tuned Support Vector Machines classifier, we obtained an accuracy of 0.89, Cohen’s K = 0.76, and F1 score of 0.91 on a binary classification task (i.e. pass or fail). Other experiments including additional linguistically-motivated features, whose goal was to capture lexical differences between the students’ answer and the gold standard sentence, did not yield any significant improvement. The paper ends with a discussion of the findings and the next steps to be taken in our research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
By using a normalised Levenshtein string similarity distance [17].
- 2.
https://fasttext.cc/ (last accessed July, 2019).
- 3.
https://fasttext.cc/docs/en/crawl-vectors.html (last accessed July, 2019).
References
Angelone, A.M., Menini, S., Tonelli, S., Vittorini, P.: Dataset: short sentences on R analyses in a health informatics subject, June 2019. https://doi.org/10.5281/ZENODO.3257363
Angelone, A.M., Vittorini, P.: The automated grading of R code snippets: preliminary results in a course of health informatics. In: Gennari, R., et al. (eds.) MIS4TEL 2019. AISC, vol. 1007, pp. 19–27. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-23990-9_3
Aprosio, A.P., Moretti, G.: Tint 2.0: an all-inclusive suite for NLP in Italian. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, 10–12 December 2018 (2018). http://ceur-ws.org/Vol-2253/paper58.pdf
Bernardi, A., et al.: On the design and development of an assessment system with adaptive capabilities. In: Di Mascio, T., et al. (eds.) MIS4TEL 2018. AISC, vol. 804, pp. 190–199. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98872-6_23
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051, https://www.aclweb.org/anthology/Q17-1010
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642. Association for Computational Linguistics, Lisbon, September 2015. https://doi.org/10.18653/v1/D15-1075, https://www.aclweb.org/anthology/D15-1075
Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)
Cer, D., et al.: Universal sentence encoder. In: Submission to: EMNLP Demonstration, Brussels, Belgium (2018). https://arxiv.org/abs/1803.11175
Cicchetti, D.V.: Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol. Assess. 6(4), 284–290 (1994)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://www.aclweb.org/anthology/N19-1423
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013). https://doi.org/10.5120/11638-7118
Harlen, W., James, M.: Assessment and learning: differences and relationships between formative and summative assessment. Assess. Educ.: Principles Policy Pract. 4(3), 365–379 (1997). https://doi.org/10.1080/0969594970040304
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, National Taiwan University (2016)
Kiros, J., Chan, W.: InferLite: simple universal sentence representations from natural language inference data. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018, pp. 4868–4874 (2018). https://aclanthology.info/papers/D18-1524/d18-1524
Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28(5), 1–26 (2008). https://doi.org/10.18637/jss.v028.i05
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, p. 707 (1966)
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (2019). https://CRAN.R-project.org/package=e1071. Accessed July 2019
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 2011, pp. 752–762. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002568
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP (2014)
Peters, M.E., et al.: Deep contextualized word representations. In: Walker, M.A., Ji, H., Stent, A. (eds.) NAACL-HLT, pp. 2227–2237. Association for Computational Linguistics (2018). http://dblp.uni-trier.de/db/conf/naacl/naacl2018-1.html#PetersNIGCLZ18
R Core Team: R: A Language and Environment for Statistical Computing (2018). https://www.R-project.org/
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Souza, D.M., Felizardo, K.R., Barbosa, E.F.: A systematic literature review of assessment tools for programming assignments. In: 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET), pp. 147–156. IEEE, April 2016. https://doi.org/10.1109/CSEET.2016.48
Urbanek, S.: rJava: Low-Level R to Java Interface, R package version 0.9-11 (2019). https://CRAN.R-project.org/package=rJava. Accessed July 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
De Gasperis, G., Menini, S., Tonelli, S., Vittorini, P. (2019). Automated Grading of Short Text Answers: Preliminary Results in a Course of Health Informatics. In: Herzog, M., Kubincová, Z., Han, P., Temperini, M. (eds) Advances in Web-Based Learning – ICWL 2019. ICWL 2019. Lecture Notes in Computer Science(), vol 11841. Springer, Cham. https://doi.org/10.1007/978-3-030-35758-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-35758-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35757-3
Online ISBN: 978-3-030-35758-0
eBook Packages: Computer ScienceComputer Science (R0)