Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-Based Tutors

Dhamecha, Tejas I.; Marvaniya, Smit; Saha, Swarnadeep; Sindhgatta, Renuka; Sengupta, Bikram

doi:10.1007/978-3-319-93843-1_6

Tejas I. Dhamecha²¹,
Smit Marvaniya²¹,
Swarnadeep Saha²¹,
Renuka Sindhgatta²¹ &
…
Bikram Sengupta²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10947))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

6561 Accesses

Abstract

Accurately interpreting student responses is a critical requirement of dialog-based intelligent tutoring systems. The accuracy of supervised learning methods, used for interpreting or analyzing student responses, is strongly dependent on the availability of annotated training data. Collecting and grading student responses is tedious, time-consuming, and expensive. This work proposes an iterative data collection and grading approach. We show that data collection efforts can be significantly reduced by predicting question difficulty and by collecting answers from a focused set of students. Further, grading efforts can be reduced by filtering student answers that may not be helpful in training Student Response Analyzer (SRA). To ensure the quality of grades, we analyze the grader characteristics, and show improvement when a biased grader is removed. An experimental evaluation on a large scale dataset shows a reduction of up to 28% in the data collection cost, and up to 10% in grading cost while improving the response analysis macro-average F1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Impact of Tutor Errors on Student Engagement in a Dialog Based Intelligent Tutoring System

Assessing the Practical Benefit of Automated Short-Answer Graders

GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation

Article Open access 19 May 2023

Notes

1.
https://github.com/facebookresearch/InferSent.
2.
For simplicity, we assume that the costs of question creation, answer collection and answer grading are uniform across questions and answers.

References

Arora, S., Nyberg, E., Rosé, C.P.: Estimating annotation cost for active learning in a multi-annotator environment. In: Proceedings of the NAACL-HLT Workshop on Active Learning for Natural Language Processing, pp. 18–26 (2009)
Google Scholar
Baldridge, J., Osborne, M.: Active learning and the total cost of annotation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2004)
Google Scholar
Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. Trans. Assoc. Comput. Linguist. 1, 391–402 (2013)
Google Scholar
Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: a word processor with a crowd inside. Commun. ACM 58(8), 85–94 (2015)
Article Google Scholar
Birnbaum, A.: Some latent train models and their use in inferring an examinee’s ability. In: Statistical Theories of Mental Test Scores, pp. 395–479 (1968)
Google Scholar
Brooks, M., Basu, S., Jacobs, C., Vanderwende, L.: Divide and correct: using clusters to grade short answers at scale. In: Proceedings of the ACM Conference on Learning@ Scale Conference, pp. 89–98 (2014)
Google Scholar
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 670–680 (2017)
Google Scholar
Dzikovska, M.O., Nielsen, R.D., Brew, C.: Towards effective tutorial feedback for explanation questions: a dataset and baselines. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 200–210 (2012)
Google Scholar
Guan, D., Yuan, W., Ma, T., Khattak, A.M., Chow, F.: Cost-sensitive elimination of mislabeled training data. Inform. Sci. 402, 170–181 (2017)
Article Google Scholar
Gweon, G., Rosé, C.P., Wittwer, J., Nueckles, M.: Supporting efficient and reliable content analysis using automatic text processing technology. In: Costabile, M.F., Paternò, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 1112–1115. Springer, Heidelberg (2005). https://doi.org/10.1007/11555261_117
Chapter Google Scholar
Horbach, A., Palmer, A., Wolska, M.: Finding a tradeoff between accuracy and rater’s workload in grading clustered short answers. In: International Conference on Language Resources and Evaluation, pp. 588–595 (2014)
Google Scholar
Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL-HLT Workshop on Active Learning for Natural Language Processing, pp. 27–35 (2009)
Google Scholar
Johnson, M.S., et al.: Marginal maximum likelihood estimation of item response models in R. J. Stat. Softw. 20(10), 1–24 (2007)
Article Google Scholar
Jurgens, D.: Embracing ambiguity: a comparison of annotation methodologies for crowdsourcing word sense labels. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 556–562 (2013)
Google Scholar
Kittur, A., Smus, B., Khamkar, S., Kraut, R.E.: CrowdForge: Crowdsourcing complex work. In: Proceedings of the ACM Symposium on User Interface Software and Technology, pp. 43–52 (2011)
Google Scholar
Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 1003–1012 (2012)
Google Scholar
Nicholson, B., Sheng, V.S., Zhang, J.: Label noise correction and application in crowdsourcing. Expert Syst. Appl. 66, 149–162 (2016)
Article Google Scholar
Nicholson, B., Zhang, J., Sheng, V.S., Wang, Z.: Label noise correction methods. In: Proceedings of IEEE International Conference on Data Science and Advanced Analytics, pp. 1–9 (2015)
Google Scholar
Rosé, C.P., Moore, J.D., Vanlehn, K., Allbritton, D.: A comparative evaluation of socratic versus didactic tutoring. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 23 (2001)
Google Scholar
Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Patt. Recogn. Lett. 24(7), 1015–1022 (2003)
Article Google Scholar
Settles, B., Craven, M., Friedland, L.: Active learning with real annotation costs. In: Proceedings of the NIPS Workshop on Cost-Sensitive Learning, pp. 1–10 (2008)
Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263 (2008)
Google Scholar
Valizadegan, H., Tan, P.N.: Kernel based detection of mislabeled training examples. In: Proceedings of the SIAM International Conference on Data Mining, pp. 309–319 (2007)
Google Scholar
Zesch, T., Heilman, M., Cahill, A.: Reducing annotation efforts in supervised short answer scoring. In: Proceedings of the NAACL-HLT Workshop on Building Educational Applications, pp. 124–132 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, Bangalore, India
Tejas I. Dhamecha, Smit Marvaniya, Swarnadeep Saha, Renuka Sindhgatta & Bikram Sengupta

Authors

Tejas I. Dhamecha
View author publications
You can also search for this author in PubMed Google Scholar
Smit Marvaniya
View author publications
You can also search for this author in PubMed Google Scholar
Swarnadeep Saha
View author publications
You can also search for this author in PubMed Google Scholar
Renuka Sindhgatta
View author publications
You can also search for this author in PubMed Google Scholar
Bikram Sengupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tejas I. Dhamecha .

Editor information

Editors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Carolyn Penstein Rosé
University of Technology, Sydney, NSW, Australia
Roberto Martínez-Maldonado
University of Duisburg-Essen, Duisburg, Germany
H. Ulrich Hoppe
UCL Institute of Education, London, UK
Rose Luckin
UCL Institute of Education, London, UK
Manolis Mavrikis
UCL Institute of Education, London, UK
Kaska Porayska-Pomsta
Carnegie Mellon University, Pittsburgh, PA, USA
Bruce McLaren
University of Sussex, Brighton, UK
Benedict du Boulay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhamecha, T.I., Marvaniya, S., Saha, S., Sindhgatta, R., Sengupta, B. (2018). Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-Based Tutors. In: Penstein Rosé, C., et al. Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science(), vol 10947. Springer, Cham. https://doi.org/10.1007/978-3-319-93843-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-93843-1_6
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93842-4
Online ISBN: 978-3-319-93843-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics