Skip to main content

Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-Based Tutors

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10947))

Included in the following conference series:

  • 6561 Accesses

Abstract

Accurately interpreting student responses is a critical requirement of dialog-based intelligent tutoring systems. The accuracy of supervised learning methods, used for interpreting or analyzing student responses, is strongly dependent on the availability of annotated training data. Collecting and grading student responses is tedious, time-consuming, and expensive. This work proposes an iterative data collection and grading approach. We show that data collection efforts can be significantly reduced by predicting question difficulty and by collecting answers from a focused set of students. Further, grading efforts can be reduced by filtering student answers that may not be helpful in training Student Response Analyzer (SRA). To ensure the quality of grades, we analyze the grader characteristics, and show improvement when a biased grader is removed. An experimental evaluation on a large scale dataset shows a reduction of up to 28% in the data collection cost, and up to 10% in grading cost while improving the response analysis macro-average F1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/facebookresearch/InferSent.

  2. 2.

    For simplicity, we assume that the costs of question creation, answer collection and answer grading are uniform across questions and answers.

References

  1. Arora, S., Nyberg, E., Rosé, C.P.: Estimating annotation cost for active learning in a multi-annotator environment. In: Proceedings of the NAACL-HLT Workshop on Active Learning for Natural Language Processing, pp. 18–26 (2009)

    Google Scholar 

  2. Baldridge, J., Osborne, M.: Active learning and the total cost of annotation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2004)

    Google Scholar 

  3. Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. Trans. Assoc. Comput. Linguist. 1, 391–402 (2013)

    Google Scholar 

  4. Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: a word processor with a crowd inside. Commun. ACM 58(8), 85–94 (2015)

    Article  Google Scholar 

  5. Birnbaum, A.: Some latent train models and their use in inferring an examinee’s ability. In: Statistical Theories of Mental Test Scores, pp. 395–479 (1968)

    Google Scholar 

  6. Brooks, M., Basu, S., Jacobs, C., Vanderwende, L.: Divide and correct: using clusters to grade short answers at scale. In: Proceedings of the ACM Conference on Learning@ Scale Conference, pp. 89–98 (2014)

    Google Scholar 

  7. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 670–680 (2017)

    Google Scholar 

  8. Dzikovska, M.O., Nielsen, R.D., Brew, C.: Towards effective tutorial feedback for explanation questions: a dataset and baselines. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 200–210 (2012)

    Google Scholar 

  9. Guan, D., Yuan, W., Ma, T., Khattak, A.M., Chow, F.: Cost-sensitive elimination of mislabeled training data. Inform. Sci. 402, 170–181 (2017)

    Article  Google Scholar 

  10. Gweon, G., Rosé, C.P., Wittwer, J., Nueckles, M.: Supporting efficient and reliable content analysis using automatic text processing technology. In: Costabile, M.F., Paternò, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 1112–1115. Springer, Heidelberg (2005). https://doi.org/10.1007/11555261_117

    Chapter  Google Scholar 

  11. Horbach, A., Palmer, A., Wolska, M.: Finding a tradeoff between accuracy and rater’s workload in grading clustered short answers. In: International Conference on Language Resources and Evaluation, pp. 588–595 (2014)

    Google Scholar 

  12. Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL-HLT Workshop on Active Learning for Natural Language Processing, pp. 27–35 (2009)

    Google Scholar 

  13. Johnson, M.S., et al.: Marginal maximum likelihood estimation of item response models in R. J. Stat. Softw. 20(10), 1–24 (2007)

    Article  Google Scholar 

  14. Jurgens, D.: Embracing ambiguity: a comparison of annotation methodologies for crowdsourcing word sense labels. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 556–562 (2013)

    Google Scholar 

  15. Kittur, A., Smus, B., Khamkar, S., Kraut, R.E.: CrowdForge: Crowdsourcing complex work. In: Proceedings of the ACM Symposium on User Interface Software and Technology, pp. 43–52 (2011)

    Google Scholar 

  16. Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 1003–1012 (2012)

    Google Scholar 

  17. Nicholson, B., Sheng, V.S., Zhang, J.: Label noise correction and application in crowdsourcing. Expert Syst. Appl. 66, 149–162 (2016)

    Article  Google Scholar 

  18. Nicholson, B., Zhang, J., Sheng, V.S., Wang, Z.: Label noise correction methods. In: Proceedings of IEEE International Conference on Data Science and Advanced Analytics, pp. 1–9 (2015)

    Google Scholar 

  19. Rosé, C.P., Moore, J.D., Vanlehn, K., Allbritton, D.: A comparative evaluation of socratic versus didactic tutoring. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 23 (2001)

    Google Scholar 

  20. Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Patt. Recogn. Lett. 24(7), 1015–1022 (2003)

    Article  Google Scholar 

  21. Settles, B., Craven, M., Friedland, L.: Active learning with real annotation costs. In: Proceedings of the NIPS Workshop on Cost-Sensitive Learning, pp. 1–10 (2008)

    Google Scholar 

  22. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)

    Google Scholar 

  23. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263 (2008)

    Google Scholar 

  24. Valizadegan, H., Tan, P.N.: Kernel based detection of mislabeled training examples. In: Proceedings of the SIAM International Conference on Data Mining, pp. 309–319 (2007)

    Google Scholar 

  25. Zesch, T., Heilman, M., Cahill, A.: Reducing annotation efforts in supervised short answer scoring. In: Proceedings of the NAACL-HLT Workshop on Building Educational Applications, pp. 124–132 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tejas I. Dhamecha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dhamecha, T.I., Marvaniya, S., Saha, S., Sindhgatta, R., Sengupta, B. (2018). Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-Based Tutors. In: Penstein Rosé, C., et al. Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science(), vol 10947. Springer, Cham. https://doi.org/10.1007/978-3-319-93843-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93843-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93842-4

  • Online ISBN: 978-3-319-93843-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics