Skip to main content
Log in

Going deeper: Automatic short-answer grading by combining student and question models

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

As various educational technologies have rapidly become more powerful and more prevalent, especially from the 2010s onward, the demand of automated grading natural language responses has become a major area of research. In this work, we leverage the classic student and domain/question models that are widely used in the field of intelligent tutoring systems to the task of automatic short-answer grading (ASAG). ASAG is the process of applying natural language processing techniques to assess student-authored short answers, and conventional ASAG systems often mainly focus upon student answers, referred as answer-based. In recent years, various deep learning models have gained great popularity in a wide range of domains. While classic machine learning methods have been widely employed to ASAG, as far as we know, deep learning models have not been applied to it probably because the lexical features from short answers provide limited information. In this work, we explore the effectiveness of a deep learning model, deep belief networks (DBN), to the task of ASAG. Overall, our results on a real-world corpus demonstrate that 1) leveraging student and question models to the conventional answer-based approach can greatly enhance the performance of ASAG, and 2) deep learning models such as DBN can be productively applied to the task of ASAG.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://www.class-central.com/report/moocs-2015-stats/.

  2. https://www.class-central.com/report/moocs-2016-stats/.

References

  • An, X., Yung, Y.-F.: Item response theory: what it is and how you can use the IRT procedure to apply it. SAS Institute Inc. SAS364-2014, (2014)

  • Anderson, R.C., Biddle, W.B.: On asking people questions about what they are reading. Psychol. Learn. Motiv. 9, 89–132 (1975)

    Article  Google Scholar 

  • Attali, Y., Burstein, J.: Automated essay scoring with e-rater® v.2. J. Technol. Learn. Assess. 4(3) (2006)

  • Baker, F.B., Kim, S.-H.: Item Response Theory: Parameter Estimation Techniques. CRC Press, Boca Raton (2004)

    Book  Google Scholar 

  • Barnes, T.: The q-matrix method: mining student response data for knowledge. In: American Association for Artificial Intelligence 2005 Educational Data Mining Workshop (2005)

  • Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. Trans. Assoc. Comput. Linguist. 1, 391–402 (2013)

    Article  Google Scholar 

  • Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 153–160, (2007)

  • Bernsen, N.O., Dybkjær, H., Dybkjær, L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, Berlin (2012)

    Google Scholar 

  • Burrows, S., D’Souza, D.: Management of teaching in a complex setting. In: Proceedings of the 2nd Melbourne computing education conventicle, pp. 1–8 (2005)

  • Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)

    Article  Google Scholar 

  • Burstein, J., Leacock, C., Swartz, R.: Automated evaluation of essays and short answers (2001)

  • Carolyn, R.O.S.E.: Tools for authoring a dialogue agent that participates in learning studies. Artif. Intell. Educ. Building Technol. Rich Learn. Contexts Work 158, 43 (2007)

    Google Scholar 

  • Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: European Conference on Information Retrieval, pp. 16–27. Springer (2008)

  • Chi, M., VanLehn, K., Litman, D., Jordan, P.: Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Model. User Adapt. Interact. 21, 137–180 (2011)

    Article  Google Scholar 

  • Corbett, A.T., Anderson, J.R.: Knowledge tracing: modeling the acquisition of procedural knowledge. User Model. User Adapt. Interact. 4(4), 253–278 (1994)

    Article  Google Scholar 

  • Dzikovska, M.O., Moore, J.D., Steinhauser, N., Campbell, G., Farrow, E., Callaway, C.B.: Beetle ii: a system for tutoring and computational linguistics experimentation. In: Proceedings of the ACL 2010 System Demonstrations, pp 13–18. Association for Computational Linguistics (2010)

  • Dzikovska, M.O., Farrow, E., Moore, J.D.: Improving interpretation robustness in a tutorial dialogue system. In: Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 293–299 (2013)

  • Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)

    Article  MathSciNet  Google Scholar 

  • Graesser, A.C., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R.: Tutoring research group, et al. autotutor: a simulation of a human tutor. Cogn. Syst. Res. 1(1), 35–51 (1999)

    Article  Google Scholar 

  • Graesser, A. C., Penumatsa, P., Ventura, M., Cai, Z., Hu, X.: Using lsa in autotutor: learning through mixed initiative dialogue in natural language. Handbook of latent semantic analysis, pp. 243–262 (2007)

  • Hasanah, U., Permanasari, A.E., Kusumawardani, S.S., Pribadi, F.S.: A review of an information extraction technique approach for automatic short answer grading. In: International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 192–196. IEEE (2016)

  • Higgins, D., Burstein, J., Marcu, D., Gentile, C.: Evaluating multiple aspects of coherence in student essays. In: HLT-NAACL (2004)

  • Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)

    Article  Google Scholar 

  • Hou, W.-J., Tsao, J.-H., Li, S.-Y., Chen, L.: Automatic assessment of students’ free-text answers with support vector machines. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 235–243. Springer (2010)

  • Huang, G.B., Lee, H., Learned-Miller, E.: Learning hierarchical representations for face verification with convolutional deep belief networks. In: CVPR (2012)

  • Jia, X., Li, K., Li, X., Zhang, A.: A novel semi-supervised deep learning framework for affective state recognition on eeg signals. In: 2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 30–37. IEEE (2014)

  • Jia, X., Wang, A., Li, X., Xun, G., Xu, W., Zhang, A.: Multi-modal learning for video recommendation based on mobile application usage. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 837–842. IEEE (2015)

  • Jia, X., Khandelwal, A., Nayak, G., Gerber, J., Carlson, K., West, P., Kumar, V.: Incremental dual-memory lSTM in land cover prediction. In: Proceedings of the 23rd KDD, pp 867–876. ACM (2017)

  • Jordan, P.W., Makatchev, M., Pappuswamy, U., VanLehn, K., Albacete, P.L.: A natural language tutorial dialogue system for physics. In: FLAIRS Conference, pp 521–526 (2006)

  • Karpicke, J.D., Roediger, H.L.: The critical importance of retrieval for learning. Science 319(5865), 966–968 (2008)

    Article  Google Scholar 

  • Kim, Y.-J., Chi, M.: Temporal belief memory: Imputing missing data during RNN training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13–19, 2018, Stockholm, Sweden, pp. 2326–2332 (2018)

  • Klein, R., Kyrilov, A., Tokman, M.: Automated assessment of short free-text responses in computer science using latent semantic analysis. In: Proceedings of the 16th ITiCSE, pp. 158–162. ACM (2011)

  • Lalor, J.P., Wu, H., Yu, H.: Building an evaluation scale using item response theory. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, vol. 2016, pp. 648. NIH Public Access (2016)

  • Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)

    Article  Google Scholar 

  • Li, Xiaoyi, Jia, Xiaowei, Xun, Guangxu, Zhang, Aidong: Improving eeg feature learning via synchronized facial video. In: 2015 IEEE International Conference on Big Data (Big Data), pages 843–848. IEEE, (2015)

  • Lin, C., Chi, M.: Intervention-BKT: incorporating instructional interventions into bayesian knowledge tracing. In: Intelligent Tutoring Systems—13th International Conference, ITS 2016, Zagreb, Croatia, June 7–10, 2016. Proceedings, pp. 208–218 (2016)

  • Lin, C., Chi, M.: A comparisons of bkt, RNN and LSTM for learning gain prediction. In: Artificial intelligence in education—18th International Conference, AIED 2017, Wuhan, China, June 28–July 1, 2017, Proceedings, pp. 536–539 (2017)

  • Lin, C., Shen, S., Chi, M.: Incorporating student response time and tutor instructional interventions into student modeling. In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, UMAP 2016, Halifax, NS, Canada, July 13–17, 2016, pp 157–161 (2016)

  • Lin, C., Zhang, Y., Ivy, J.S., Capan, M., Arnold, R., Huddleston, J.M., Chi, M.: Early diagnosis and prediction of sepsis shock by combining static and dynamic information using convolutional-lSTM. In: IEEE International Conference on Healthcare Informatics, ICHI 2018, New York City, NY, USA, June 4–7, 2018, pp. 219–228 (2018)

  • Litman, D.J., Silliman, S.: Itspoke: an intelligent tutoring spoken dialogue system. In: Demonstration papers at HLT-NAACL 2004, pp 5–8. Association for Computational Linguistics (2004)

  • Luaces, O., Díez, J., Alonso-Betanzos, A., Troncoso, A., Bahamonde, A.: A factorization approach to evaluate open-response assignments in moocs using preference learning on peer assessments. Knowl. Based Syst. 85, 322–328 (2015)

    Article  Google Scholar 

  • Luaces, O., Díez, J., Alonso-Betanzos, A., Troncoso, A., Bahamonde, A.: Content-based methods in peer assessment of open-response questions to grade students as authors and as graders. Knowl. Based Syst. 117, 79–87 (2017)

    Article  Google Scholar 

  • Madnani, N., Burstein, J., Sabatini, J., O’Reilly, T.: Automated scoring of a summary writing task designed to measure reading comprehension, vol. 163. In: NAACL/HLT 2013 (2013)

  • Magnini, B., Rodríguez, M.P., Strapparava, C., Gliozzo, A., Cubero, E.A., Pérez, D.: About the effects of combining latent semantic analysis with natural language processing techniques for free-text assessment. Rev. Signos Estud. Lingüíst. 59, 325–343 (2005)

    Google Scholar 

  • Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  • Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT press, Cambridge (2000)

    MATH  Google Scholar 

  • Mason, O., Grove-Stephensen, I. : Automated free text marking with paperless school (2002)

  • Meurers, D., Ziai, R., Ott, N., Kopp, J.: Evaluating answers to reading comprehension questions in context: results for german and the role of information structure. In: Proceedings of the TextInfer 2011 Workshop on Textual Entailment, pp. 1–9 (2011)

  • Mitchell, T., Russell, T., Broomhead, P., Aldridge, N.: Towards robust computerised marking of free-text responses (2002)

  • Mohamed, A.-R., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: NIPs (2009)

  • Mohamed, A.-R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)

    Article  Google Scholar 

  • Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2009)

  • Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 752–762. Association for Computational Linguistics (2011)

  • Pérez, D.: Automatic evaluation of user’s short essays by using statistical and shallow natural language processing techniques. Advanced Studies Diploma (Escuela Politécnica Superior, Universidad Autónoma de Madrid) (2004)

  • Pérez-Marín, D., Pascual-Nieto, I., Rodríguez, P.: Computer-assisted assessment of free-text answers. Knowl. Eng. Rev. 24(04), 353–374 (2009)

    Article  Google Scholar 

  • Pulman, S.G., Sukkarieh, J.Z.: Automatic short answer marking. In: Proceedings of the 2nd Workshop on Building Educational Applications Using NLP

  • Raman, K., Joachims, T.: Methods for ordinal peer grading. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1046. ACM (2014)

  • Raman, K., Joachims, T.: Bayesian ordinal peer grading. In: Proceedings of the 2nd (2015) ACM Conference on Learning@ Scale, pp. 149–156. ACM (2015)

  • Rodrigues, F., Oliveira, P.: A system for formative assessment and monitoring of students’ progress. Comput. Educ. 76, 30–41 (2014)

    Article  Google Scholar 

  • Sima, D., Schmuck, B., Szöllősi, S., Miklós, Á.: Intelligent short text assessment in emax. In: Towards Intelligent Engineering and Information Technology, pp. 435–445. Springer (2009)

  • Tatsuoka, K.: Rule space: an approach for dealing with misconceptions based on item response theory. J. Educ. Meas. 20(4), 345–354 (1983)

    Article  Google Scholar 

  • Thomson, D., Mitrovic, A.: Towards a negotiable student model for constraint-based ITSS (2009)

  • VanLehn, K., Jordan, P.W., Litman, D.J.: Developing pedagogically effective tutorial dialogue tactics: experiments and a testbed. In: SLaTE. Citeseer (2007)

  • Vanlehn, K.: The behavior of tutoring systems. Int. J. Artif. Intell. Educ. 16(3), 227–265 (2006)

    Google Scholar 

  • Zhang, Y., Lin, C., Chi, M., Ivy, J., Capan, M., Huddleston, J.M.: LSTM for septic shock: adding unreliable labels to reliable predictions. In: 2017 IEEE International Conference on Big Data (Big Data), pp 1233–1242. IEEE (2017)

  • Zhang, Y., Yang, X., Ivy, J.S., Chi, M.: ATTAIN: attention-based time-aware LSTM networks for disease progression modeling. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp. 4369–4375 (2019)

Download references

Acknowledgements

This research was supported by the NSF Grants #1432156: ‘Educational Data Mining for Individualized Instruction in STEM Learning Environments’, #1651909: ‘CAREER: Improving Adaptive Decision Making in Interactive Learning Environment’, #1660878 ‘MetaDash: A Teacher Dashboard Informed by Real-Time Multichannel Self-Regulated Learning Data’ and #1726550: ‘Integrated Data-driven Technologies for Individualized Instruction in STEM Learning Environments’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Lin, C. & Chi, M. Going deeper: Automatic short-answer grading by combining student and question models. User Model User-Adap Inter 30, 51–80 (2020). https://doi.org/10.1007/s11257-019-09251-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-019-09251-6

Keywords

Navigation