Automated Grading of Short Text Answers: Preliminary Results in a Course of Health Informatics

De Gasperis, Giovanni; Menini, Stefano; Tonelli, Sara; Vittorini, Pierpaolo

doi:10.1007/978-3-030-35758-0_18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11841))

Included in the following conference series:

International Conference on Web-Based Learning

1055 Accesses
5 Citations

Abstract

Students learning Health Informatics in the degree course of Medicine and Surgery of the University of L’Aquila (Italy) are required – to pass the exam – to submit solutions to assignments concerning the execution and interpretation of statistical analyses. The paper presents a tool for the automated grading of such a kind of solutions, where the statistical analyses are made up R commands and outputs, and the interpretations are short text answers. The tool performs a static analysis of the R commands with the respective output, and uses Natural Language Processing techniques for the short text answers. The paper summarises the solution regarding the R commands and output, and delves into the method and the results used for the automated classification of the short text answers. In particular, we show that through FastText sentence embeddings and a tuned Support Vector Machines classifier, we obtained an accuracy of 0.89, Cohen’s K = 0.76, and F1 score of 0.91 on a binary classification task (i.e. pass or fail). Other experiments including additional linguistically-motivated features, whose goal was to capture lexical differences between the students’ answer and the gold standard sentence, did not yield any significant improvement. The paper ends with a discussion of the findings and the next steps to be taken in our research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
By using a normalised Levenshtein string similarity distance [17].
2.
https://fasttext.cc/ (last accessed July, 2019).
3.
https://fasttext.cc/docs/en/crawl-vectors.html (last accessed July, 2019).

References

Angelone, A.M., Menini, S., Tonelli, S., Vittorini, P.: Dataset: short sentences on R analyses in a health informatics subject, June 2019. https://doi.org/10.5281/ZENODO.3257363
Angelone, A.M., Vittorini, P.: The automated grading of R code snippets: preliminary results in a course of health informatics. In: Gennari, R., et al. (eds.) MIS4TEL 2019. AISC, vol. 1007, pp. 19–27. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-23990-9_3
Chapter Google Scholar
Aprosio, A.P., Moretti, G.: Tint 2.0: an all-inclusive suite for NLP in Italian. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, 10–12 December 2018 (2018). http://ceur-ws.org/Vol-2253/paper58.pdf
Bernardi, A., et al.: On the design and development of an assessment system with adaptive capabilities. In: Di Mascio, T., et al. (eds.) MIS4TEL 2018. AISC, vol. 804, pp. 190–199. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98872-6_23
Chapter Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051, https://www.aclweb.org/anthology/Q17-1010
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642. Association for Computational Linguistics, Lisbon, September 2015. https://doi.org/10.18653/v1/D15-1075, https://www.aclweb.org/anthology/D15-1075
Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)
Article Google Scholar
Cer, D., et al.: Universal sentence encoder. In: Submission to: EMNLP Demonstration, Brussels, Belgium (2018). https://arxiv.org/abs/1803.11175
Cicchetti, D.V.: Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol. Assess. 6(4), 284–290 (1994)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://www.aclweb.org/anthology/N19-1423
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013). https://doi.org/10.5120/11638-7118
Article Google Scholar
Harlen, W., James, M.: Assessment and learning: differences and relationships between formative and summative assessment. Assess. Educ.: Principles Policy Pract. 4(3), 365–379 (1997). https://doi.org/10.1080/0969594970040304
Article Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, National Taiwan University (2016)
Google Scholar
Kiros, J., Chan, W.: InferLite: simple universal sentence representations from natural language inference data. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018, pp. 4868–4874 (2018). https://aclanthology.info/papers/D18-1524/d18-1524
Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28(5), 1–26 (2008). https://doi.org/10.18637/jss.v028.i05
Article Google Scholar
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, p. 707 (1966)
Google Scholar
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (2019). https://CRAN.R-project.org/package=e1071. Accessed July 2019
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 2011, pp. 752–762. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002568
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Walker, M.A., Ji, H., Stent, A. (eds.) NAACL-HLT, pp. 2227–2237. Association for Computational Linguistics (2018). http://dblp.uni-trier.de/db/conf/naacl/naacl2018-1.html#PetersNIGCLZ18
R Core Team: R: A Language and Environment for Statistical Computing (2018). https://www.R-project.org/
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Souza, D.M., Felizardo, K.R., Barbosa, E.F.: A systematic literature review of assessment tools for programming assignments. In: 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET), pp. 147–156. IEEE, April 2016. https://doi.org/10.1109/CSEET.2016.48
Urbanek, S.: rJava: Low-Level R to Java Interface, R package version 0.9-11 (2019). https://CRAN.R-project.org/package=rJava. Accessed July 2019

Download references

Author information

Authors and Affiliations

DISIM, University of L’Aquila, Via Vetoio, 67100, L’Aquila, Italy
Giovanni De Gasperis
FBK-DH, Via Sommarive 18, 38123, Povo, Italy
Stefano Menini & Sara Tonelli
MESVA, University of L’Aquila, P.le S. Tommasi 1, 67100, L’Aquila, Italy
Pierpaolo Vittorini

Authors

Giovanni De Gasperis
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Menini
View author publications
You can also search for this author in PubMed Google Scholar
Sara Tonelli
View author publications
You can also search for this author in PubMed Google Scholar
Pierpaolo Vittorini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierpaolo Vittorini .

Editor information

Editors and Affiliations

Magdeburg-Stendal University of Applied Sciences, Magdeburg, Germany
Michael A. Herzog
Comenius University in Bratislava, Bratislava, Slovakia
Zuzana Kubincová
Chongqing Academy of Science and Technology, Chongqing, China
Peng Han
Sapienza University, Rome, Italy
Marco Temperini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Gasperis, G., Menini, S., Tonelli, S., Vittorini, P. (2019). Automated Grading of Short Text Answers: Preliminary Results in a Course of Health Informatics. In: Herzog, M., Kubincová, Z., Han, P., Temperini, M. (eds) Advances in Web-Based Learning – ICWL 2019. ICWL 2019. Lecture Notes in Computer Science(), vol 11841. Springer, Cham. https://doi.org/10.1007/978-3-030-35758-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-35758-0_18
Published: 16 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35757-3
Online ISBN: 978-3-030-35758-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics