ABSTRACT
Grading student work is critical for assessing their understanding and providing necessary feedback. However, answer grading can become monotonous for teachers. On the standard ASAG data set, our system shows substantial improvements in classification disparity of correct and incorrect answers from a reference answer compared to baseline methods. Our supervised model (1) utilizes recent advances in semantic word embeddings and (2) implements ideas from one-shot learning methods, which are proven to work with minimal. We present experimental results from a model based on different approaches and demonstrates decent performance on standard benchmark dataset.
Supplemental Material
Available for Download
- Daniel Bär, Chris Biemann, Iryna Gurevych, and Torsten Zesch. 2012. Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. Association for Computational Linguistics, 435–440.Google Scholar
- Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, Vol. 1. IEEE, 539–546.Google ScholarDigital Library
- Gráinne Conole and Bill Warburton. 2005. A review of computer-assisted assessment. ALT-J 13, 1 (2005), 17–31.Google ScholarCross Ref
- Lushan Han, Abhay L Kashyap, Tim Finin, James Mayfield, and Jonathan Weese. 2013. UMBC_EBIQUITY-CORE: semantic textual similarity systems. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, Vol. 1. 44–52.Google Scholar
- Christian Hänig, Robert Remus, and Xose De La Puente. 2015. Exb themis: Extensive feature extraction from word alignments for semantic textual similarity. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). 264–268.Google ScholarCross Ref
- Michael Heilman and Nitin Madnani. 2013. ETS: Domain adaptation and stacking for short answer scoring. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Vol. 2. 275–279.Google Scholar
- Derrick Higgins, Jill Burstein, Daniel Marcu, and Claudia Gentile. 2004. Evaluating multiple aspects of coherence in student essays. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- Sergio Jimenez, Claudia Becerra, and Alexander Gelbukh. 2013. SOFTCARDINALITY: Hierarchical text overlap for student response analysis. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Vol. 2. 280–284.Google Scholar
- Thomas K Landauer, Peter W Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse processes 25, 2-3 (1998), 259–284.Google Scholar
- Claudia Leacock and Martin Chodorow. 2003. C-rater: Automated scoring of short-answer questions. Computers and the Humanities 37, 4 (2003), 389–405.Google ScholarCross Ref
- Claudia Leacock, George A Miller, and Martin Chodorow. 1998. Using corpus statistics and WordNet relations for sense identification. Computational Linguistics 24, 1 (1998), 147–165.Google ScholarDigital Library
- Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation. ACM, 24–26.Google ScholarDigital Library
- André Lynum, Partha Pakray, Björn Gambäck, and Sergio Jimenez. 2014. NTNU: Measuring semantic similarity with sublexical feature representations and soft cardinality. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 448–453.Google ScholarCross Ref
- Manvi Mahana, Mishel Johns, and Ashwin Apte. 2012. Automated essay grading using machine learning. Mach. Learn. Session, Stanford University 5 (2012).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google Scholar
- Tom Mitchell, Terry Russell, Peter Broomhead, and Nicola Aldridge. 2002. Towards robust computerised marking of free-text responses. (2002).Google Scholar
- Michael Mohler, Razvan Bunescu, and Rada Mihalcea. 2011. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 752–762.Google ScholarDigital Library
- Michael Mohler and Rada Mihalcea. 2009. Text-to-text semantic similarity for automatic short answer grading. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 567–575.Google ScholarDigital Library
- Rodney D Nielsen, Wayne Ward, and James H Martin. 2009. Recognizing entailment in intelligent tutoring systems. Natural Language Engineering 15, 4 (2009), 479–501.Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.Google ScholarCross Ref
- Stephen G Pulman and Jana Z Sukkarieh. 2005. Automatic short answer marking. In Proceedings of the second workshop on Building Educational Applications Using NLP. Association for Computational Linguistics, 9–16.Google ScholarCross Ref
- Keisuke Sakaguchi, Michael Heilman, and Nitin Madnani. 2015. Effective feature integration for automated short answer scoring. In Proceedings of the 2015 conference of the North American Chapter of the association for computational linguistics: Human language technologies. 1049–1054.Google ScholarCross Ref
- Gerard Salton and Michael J McGill. 1986. Introduction to modern information retrieval. (1986).Google Scholar
- Richard Socher, Eric H Huang, Jeffrey Pennin, Christopher D Manning, and Andrew Y Ng. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in neural information processing systems. 801–809.Google Scholar
- Jana Z Sukkarieh, Stephen G Pulman, and Nicholas Raikes. 2004. Auto-marking 2: An update on the UCLES-Oxford University research into using computational linguistics to score short, free text responses. International Association of Educational Assessment, Philadephia (2004).Google Scholar
- Md Arafat Sultan, Cristobal Salazar, and Tamara Sumner. 2016. Fast and easy short answer grading with high accuracy. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1070–1075.Google ScholarCross Ref
- Hao-Chuan Wang, Chun-Yen Chang, and Tsai-Yen Li. 2008. Assessing creative problem-solving with automated text grading. Computers & Education 51, 4 (2008), 1450–1466.Google ScholarDigital Library
- Kilian Q Weinberger and Lawrence K Saul. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, Feb (2009), 207–244.Google ScholarDigital Library
- Wenpeng Yin and Hinrich Schütze. 2015. Learning meta-embeddings by using ensembles of embedding sets. arXiv preprint arXiv:1508.04257(2015).Google Scholar
Index Terms
- Triplet Loss based Siamese Networks for Automatic Short Answer Grading
Recommendations
Hybrid Deep Learning CNN-Bidirectional LSTM and Manhattan Distance for Japanese Automated Short Answer Grading: Use case in Japanese Language Studies
ICCIP '22: Proceedings of the 8th International Conference on Communication and Information ProcessingThis paper discusses the development of an Automatic Essay Grading System (SIMPLE-O) designed using hybrid CNN and Bidirectional LSTM and Manhattan Distance for Japanese language course essay grading. The most stable and best model is trained using ...
A framework for effectively utilising human grading input in automated short answer grading
Short answer questions are effective for recall knowledge assessment. Grading a large amount of short answers is costly and time consuming. To apply short answer questions on MOOCs platforms, the issues of scalability and responsiveness must be addressed. ...
Automated Short-Answer Grading Using Deep Neural Networks and Item Response Theory
Artificial Intelligence in EducationAbstractAutomated short-answer grading (ASAG) methods using deep neural networks (DNN) have achieved state-of-the-art accuracy. However, further improvement is required for high-stakes and large-scale examinations because even a small scoring error will ...
Comments