Going deeper: Automatic short-answer grading by combining student and question models

Zhang, Yuan; Lin, Chen; Chi, Min

doi:10.1007/s11257-019-09251-6

Going deeper: Automatic short-answer grading by combining student and question models

Published: 01 January 2020

Volume 30, pages 51–80, (2020)
Cite this article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

1128 Accesses
6 Citations
Explore all metrics

Abstract

As various educational technologies have rapidly become more powerful and more prevalent, especially from the 2010s onward, the demand of automated grading natural language responses has become a major area of research. In this work, we leverage the classic student and domain/question models that are widely used in the field of intelligent tutoring systems to the task of automatic short-answer grading (ASAG). ASAG is the process of applying natural language processing techniques to assess student-authored short answers, and conventional ASAG systems often mainly focus upon student answers, referred as answer-based. In recent years, various deep learning models have gained great popularity in a wide range of domains. While classic machine learning methods have been widely employed to ASAG, as far as we know, deep learning models have not been applied to it probably because the lexical features from short answers provide limited information. In this work, we explore the effectiveness of a deep learning model, deep belief networks (DBN), to the task of ASAG. Overall, our results on a real-world corpus demonstrate that 1) leveraging student and question models to the conventional answer-based approach can greatly enhance the performance of ASAG, and 2) deep learning models such as DBN can be productively applied to the task of ASAG.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Short Answer Grading via Multiway Attention Networks

Deep Learning Techniques for Automatic Short Answer Grading: Predicting Scores for English and German Answers

Incorporating Question Information to Enhance the Performance of Automatic Short Answer Grading

Notes

References

An, X., Yung, Y.-F.: Item response theory: what it is and how you can use the IRT procedure to apply it. SAS Institute Inc. SAS364-2014, (2014)
Anderson, R.C., Biddle, W.B.: On asking people questions about what they are reading. Psychol. Learn. Motiv. 9, 89–132 (1975)
Article Google Scholar
Attali, Y., Burstein, J.: Automated essay scoring with e-rater® v.2. J. Technol. Learn. Assess. 4(3) (2006)
Baker, F.B., Kim, S.-H.: Item Response Theory: Parameter Estimation Techniques. CRC Press, Boca Raton (2004)
Book Google Scholar
Barnes, T.: The q-matrix method: mining student response data for knowledge. In: American Association for Artificial Intelligence 2005 Educational Data Mining Workshop (2005)
Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. Trans. Assoc. Comput. Linguist. 1, 391–402 (2013)
Article Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 153–160, (2007)
Bernsen, N.O., Dybkjær, H., Dybkjær, L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, Berlin (2012)
Google Scholar
Burrows, S., D’Souza, D.: Management of teaching in a complex setting. In: Proceedings of the 2nd Melbourne computing education conventicle, pp. 1–8 (2005)
Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)
Article Google Scholar
Burstein, J., Leacock, C., Swartz, R.: Automated evaluation of essays and short answers (2001)
Carolyn, R.O.S.E.: Tools for authoring a dialogue agent that participates in learning studies. Artif. Intell. Educ. Building Technol. Rich Learn. Contexts Work 158, 43 (2007)
Google Scholar
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: European Conference on Information Retrieval, pp. 16–27. Springer (2008)
Chi, M., VanLehn, K., Litman, D., Jordan, P.: Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Model. User Adapt. Interact. 21, 137–180 (2011)
Article Google Scholar
Corbett, A.T., Anderson, J.R.: Knowledge tracing: modeling the acquisition of procedural knowledge. User Model. User Adapt. Interact. 4(4), 253–278 (1994)
Article Google Scholar
Dzikovska, M.O., Moore, J.D., Steinhauser, N., Campbell, G., Farrow, E., Callaway, C.B.: Beetle ii: a system for tutoring and computational linguistics experimentation. In: Proceedings of the ACL 2010 System Demonstrations, pp 13–18. Association for Computational Linguistics (2010)
Dzikovska, M.O., Farrow, E., Moore, J.D.: Improving interpretation robustness in a tutorial dialogue system. In: Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 293–299 (2013)
Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Article MathSciNet Google Scholar
Graesser, A.C., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R.: Tutoring research group, et al. autotutor: a simulation of a human tutor. Cogn. Syst. Res. 1(1), 35–51 (1999)
Article Google Scholar
Graesser, A. C., Penumatsa, P., Ventura, M., Cai, Z., Hu, X.: Using lsa in autotutor: learning through mixed initiative dialogue in natural language. Handbook of latent semantic analysis, pp. 243–262 (2007)
Hasanah, U., Permanasari, A.E., Kusumawardani, S.S., Pribadi, F.S.: A review of an information extraction technique approach for automatic short answer grading. In: International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 192–196. IEEE (2016)
Higgins, D., Burstein, J., Marcu, D., Gentile, C.: Evaluating multiple aspects of coherence in student essays. In: HLT-NAACL (2004)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Article Google Scholar
Hou, W.-J., Tsao, J.-H., Li, S.-Y., Chen, L.: Automatic assessment of students’ free-text answers with support vector machines. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 235–243. Springer (2010)
Huang, G.B., Lee, H., Learned-Miller, E.: Learning hierarchical representations for face verification with convolutional deep belief networks. In: CVPR (2012)
Jia, X., Li, K., Li, X., Zhang, A.: A novel semi-supervised deep learning framework for affective state recognition on eeg signals. In: 2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 30–37. IEEE (2014)
Jia, X., Wang, A., Li, X., Xun, G., Xu, W., Zhang, A.: Multi-modal learning for video recommendation based on mobile application usage. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 837–842. IEEE (2015)
Jia, X., Khandelwal, A., Nayak, G., Gerber, J., Carlson, K., West, P., Kumar, V.: Incremental dual-memory lSTM in land cover prediction. In: Proceedings of the 23rd KDD, pp 867–876. ACM (2017)
Jordan, P.W., Makatchev, M., Pappuswamy, U., VanLehn, K., Albacete, P.L.: A natural language tutorial dialogue system for physics. In: FLAIRS Conference, pp 521–526 (2006)
Karpicke, J.D., Roediger, H.L.: The critical importance of retrieval for learning. Science 319(5865), 966–968 (2008)
Article Google Scholar
Kim, Y.-J., Chi, M.: Temporal belief memory: Imputing missing data during RNN training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13–19, 2018, Stockholm, Sweden, pp. 2326–2332 (2018)
Klein, R., Kyrilov, A., Tokman, M.: Automated assessment of short free-text responses in computer science using latent semantic analysis. In: Proceedings of the 16th ITiCSE, pp. 158–162. ACM (2011)
Lalor, J.P., Wu, H., Yu, H.: Building an evaluation scale using item response theory. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, vol. 2016, pp. 648. NIH Public Access (2016)
Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)
Article Google Scholar
Li, Xiaoyi, Jia, Xiaowei, Xun, Guangxu, Zhang, Aidong: Improving eeg feature learning via synchronized facial video. In: 2015 IEEE International Conference on Big Data (Big Data), pages 843–848. IEEE, (2015)
Lin, C., Chi, M.: Intervention-BKT: incorporating instructional interventions into bayesian knowledge tracing. In: Intelligent Tutoring Systems—13th International Conference, ITS 2016, Zagreb, Croatia, June 7–10, 2016. Proceedings, pp. 208–218 (2016)
Lin, C., Chi, M.: A comparisons of bkt, RNN and LSTM for learning gain prediction. In: Artificial intelligence in education—18th International Conference, AIED 2017, Wuhan, China, June 28–July 1, 2017, Proceedings, pp. 536–539 (2017)
Lin, C., Shen, S., Chi, M.: Incorporating student response time and tutor instructional interventions into student modeling. In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, UMAP 2016, Halifax, NS, Canada, July 13–17, 2016, pp 157–161 (2016)
Lin, C., Zhang, Y., Ivy, J.S., Capan, M., Arnold, R., Huddleston, J.M., Chi, M.: Early diagnosis and prediction of sepsis shock by combining static and dynamic information using convolutional-lSTM. In: IEEE International Conference on Healthcare Informatics, ICHI 2018, New York City, NY, USA, June 4–7, 2018, pp. 219–228 (2018)
Litman, D.J., Silliman, S.: Itspoke: an intelligent tutoring spoken dialogue system. In: Demonstration papers at HLT-NAACL 2004, pp 5–8. Association for Computational Linguistics (2004)
Luaces, O., Díez, J., Alonso-Betanzos, A., Troncoso, A., Bahamonde, A.: A factorization approach to evaluate open-response assignments in moocs using preference learning on peer assessments. Knowl. Based Syst. 85, 322–328 (2015)
Article Google Scholar
Luaces, O., Díez, J., Alonso-Betanzos, A., Troncoso, A., Bahamonde, A.: Content-based methods in peer assessment of open-response questions to grade students as authors and as graders. Knowl. Based Syst. 117, 79–87 (2017)
Article Google Scholar
Madnani, N., Burstein, J., Sabatini, J., O’Reilly, T.: Automated scoring of a summary writing task designed to measure reading comprehension, vol. 163. In: NAACL/HLT 2013 (2013)
Magnini, B., Rodríguez, M.P., Strapparava, C., Gliozzo, A., Cubero, E.A., Pérez, D.: About the effects of combining latent semantic analysis with natural language processing techniques for free-text assessment. Rev. Signos Estud. Lingüíst. 59, 325–343 (2005)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
MATH Google Scholar
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT press, Cambridge (2000)
MATH Google Scholar
Mason, O., Grove-Stephensen, I. : Automated free text marking with paperless school (2002)
Meurers, D., Ziai, R., Ott, N., Kopp, J.: Evaluating answers to reading comprehension questions in context: results for german and the role of information structure. In: Proceedings of the TextInfer 2011 Workshop on Textual Entailment, pp. 1–9 (2011)
Mitchell, T., Russell, T., Broomhead, P., Aldridge, N.: Towards robust computerised marking of free-text responses (2002)
Mohamed, A.-R., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: NIPs (2009)
Mohamed, A.-R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)
Article Google Scholar
Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2009)
Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 752–762. Association for Computational Linguistics (2011)
Pérez, D.: Automatic evaluation of user’s short essays by using statistical and shallow natural language processing techniques. Advanced Studies Diploma (Escuela Politécnica Superior, Universidad Autónoma de Madrid) (2004)
Pérez-Marín, D., Pascual-Nieto, I., Rodríguez, P.: Computer-assisted assessment of free-text answers. Knowl. Eng. Rev. 24(04), 353–374 (2009)
Article Google Scholar
Pulman, S.G., Sukkarieh, J.Z.: Automatic short answer marking. In: Proceedings of the 2nd Workshop on Building Educational Applications Using NLP
Raman, K., Joachims, T.: Methods for ordinal peer grading. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1046. ACM (2014)
Raman, K., Joachims, T.: Bayesian ordinal peer grading. In: Proceedings of the 2nd (2015) ACM Conference on Learning@ Scale, pp. 149–156. ACM (2015)
Rodrigues, F., Oliveira, P.: A system for formative assessment and monitoring of students’ progress. Comput. Educ. 76, 30–41 (2014)
Article Google Scholar
Sima, D., Schmuck, B., Szöllősi, S., Miklós, Á.: Intelligent short text assessment in emax. In: Towards Intelligent Engineering and Information Technology, pp. 435–445. Springer (2009)
Tatsuoka, K.: Rule space: an approach for dealing with misconceptions based on item response theory. J. Educ. Meas. 20(4), 345–354 (1983)
Article Google Scholar
Thomson, D., Mitrovic, A.: Towards a negotiable student model for constraint-based ITSS (2009)
VanLehn, K., Jordan, P.W., Litman, D.J.: Developing pedagogically effective tutorial dialogue tactics: experiments and a testbed. In: SLaTE. Citeseer (2007)
Vanlehn, K.: The behavior of tutoring systems. Int. J. Artif. Intell. Educ. 16(3), 227–265 (2006)
Google Scholar
Zhang, Y., Lin, C., Chi, M., Ivy, J., Capan, M., Huddleston, J.M.: LSTM for septic shock: adding unreliable labels to reliable predictions. In: 2017 IEEE International Conference on Big Data (Big Data), pp 1233–1242. IEEE (2017)
Zhang, Y., Yang, X., Ivy, J.S., Chi, M.: ATTAIN: attention-based time-aware LSTM networks for disease progression modeling. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp. 4369–4375 (2019)

Download references

Acknowledgements

This research was supported by the NSF Grants #1432156: ‘Educational Data Mining for Individualized Instruction in STEM Learning Environments’, #1651909: ‘CAREER: Improving Adaptive Decision Making in Interactive Learning Environment’, #1660878 ‘MetaDash: A Teacher Dashboard Informed by Real-Time Multichannel Self-Regulated Learning Data’ and #1726550: ‘Integrated Data-driven Technologies for Individualized Instruction in STEM Learning Environments’.

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, Raleigh, NC, 27695, USA
Yuan Zhang, Chen Lin & Min Chi

Authors

Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Min Chi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Lin, C. & Chi, M. Going deeper: Automatic short-answer grading by combining student and question models. User Model User-Adap Inter 30, 51–80 (2020). https://doi.org/10.1007/s11257-019-09251-6

Download citation

Received: 05 September 2018
Accepted: 16 November 2019
Published: 01 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11257-019-09251-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Going deeper: Automatic short-answer grading by combining student and question models

Abstract

Access this article

Similar content being viewed by others

Automatic Short Answer Grading via Multiway Attention Networks

Deep Learning Techniques for Automatic Short Answer Grading: Predicting Scores for English and German Answers

Incorporating Question Information to Enhance the Performance of Automatic Short Answer Grading

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Going deeper: Automatic short-answer grading by combining student and question models

Abstract

Access this article

Similar content being viewed by others

Automatic Short Answer Grading via Multiway Attention Networks

Deep Learning Techniques for Automatic Short Answer Grading: Predicting Scores for English and German Answers

Incorporating Question Information to Enhance the Performance of Automatic Short Answer Grading

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation