Skip to main content

The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10947))

Included in the following conference series:

Abstract

In the domain of programming, intelligent tutoring systems increasingly employ data-driven methods to automate hint generation. Evaluations of these systems have largely focused on whether they can reliably provide hints for most students, and how much data is needed to do so, rather than how useful the resulting hints are to students. We present a method for evaluating the quality of data-driven hints and how their quality is impacted by the data used to generate them. Using two datasets, we investigate how the quantity of data and the source of data (whether it comes from students or experts) impact one hint generation algorithm. We find that with student training data, hint quality stops improving after 15–20 training solutions and can decrease with additional data. We also find that student data outperforms a single expert solution but that a comprehensive set of expert solutions generally performs best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Full datasets and hint ratings are available at go.ncsu.edu/hint-quality-data.

  2. 2.

    Over 200 trials, the standard error of the averaged QualityScores was always less than 0.01, and averaged less than 0.0025 across values of i.

  3. 3.

    Because the OneExpert baseline uses only one training solution, both weighting approaches produce the same results. For simplicity, Fig. 1 shows only voting-based weighting for the AllExpert baseline.

References

  1. Barnes, T., Stamper, J.: Automatic hint generation for logic proof tutoring using historical data. J. Educ. Technol. Soc. 13(1), 3 (2010)

    Google Scholar 

  2. Chow, S., Yacef, K., Koprinska, I., Curran, J.: Automated data-driven hints for computer programming students. In: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 5–10 (2017)

    Google Scholar 

  3. Corbett, A.T., Anderson, J.R.: Locus of feedback control in computer-based tutoring: impact on learning rate, achievement and attitudes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 245–252 (2001)

    Google Scholar 

  4. Eagle, M., Johnson, M., Barnes, T.: Interaction networks: generating high level hints based on network community clustering. In: International Educational Data Mining Society (2012)

    Google Scholar 

  5. Fossati, D., Di Eugenio, B., Ohlsson, S., Brown, C., Chen, L.: Data driven automatic feedback generation in the ilist intelligent tutoring system. Technol. Inst. Cogn. Learn. 10(1), 5–26 (2015)

    Google Scholar 

  6. Gross, S., Mokbel, B., Paaßen, B., Hammer, B., Pinkwart, N.: Example-based feedback provision using structured solution spaces. Int. J. Learn. Technol. 10 9(3), 248–280 (2014)

    Article  Google Scholar 

  7. Gupta, R., Pal, S., Kanade, A., Shevade, S.: Deepfix: fixing common C language errors by deep learning. In: AAAI, pp. 1345–1351 (2017)

    Google Scholar 

  8. Hartmann, B., MacDougall, D., Brandt, J., Klemmer, S.R.: What would other programmers do: suggesting solutions to error messages. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1019–1028 (2010)

    Google Scholar 

  9. Koedinger, K.R., Baker, R.S., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC DataShop. In: Handbook of Educational Data Mining, p. 43 (2010)

    Google Scholar 

  10. Koedinger, K.R., Stamper, J.C., McLaughlin, E.A., Nixon, T.: Using data-driven discovery of better student models to improve student learning. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 421–430. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_43

    Chapter  Google Scholar 

  11. Lazar, T., Bratko, I.: Data-driven program synthesis for hint generation in programming tutors. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 306–311. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_38

    Chapter  Google Scholar 

  12. Lazar, T., Možina, M., Bratko, I.: Automatic extraction of AST patterns for debugging student programs. In: André, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 162–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61425-0_14

    Chapter  Google Scholar 

  13. Mitrovic, A.: A knowledge-based teaching system for SQL. In: Proceedings of ED-MEDIA, vol. 98, 1027–1032 (1998)

    Google Scholar 

  14. Mostafavi, B., Zhou, G., Lynch, C., Chi, M., Barnes, T.: Data-driven worked examples improve retention and completion in a logic tutor. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 726–729. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19773-9_102

    Chapter  Google Scholar 

  15. Peddycord III, B., Hicks, A., Barnes, T.: Generating hints for programming problems using intermediate output. In: Proceedings of Educational Data Mining (2014)

    Google Scholar 

  16. Perelman, D., Gulwani, S., Grossman, D.: Test-driven synthesis for automated feedback for introductory computer science assignments. In: Proceedings of Data Mining for Educational Assessment and Feedback (ASSESS 2014) (2014)

    Google Scholar 

  17. Piech, C., Sahami, M., Huang, J., Guibas, L.: Autonomously generating hints by inferring problem solving policies. In: Proceedings of the Second ACM Conference on Learning@ Scale, pp. 195–204 (2015)

    Google Scholar 

  18. Price, T., Zhi, R., Barnes, T.: Evaluation of a data-driven feedback algorithm for open-ended programming. In: Proceedings of Educational Data Mining (2017)

    Google Scholar 

  19. Price, T.W., Dong, Y., Barnes, T.: Generating data-driven hints for open-ended programming. In: Proceedings of Educational Data Mining, pp. 191–198 (2016)

    Google Scholar 

  20. Price, T.W., Dong, Y., Lipovac, D.: iSnap: towards intelligent tutoring in novice programming environments. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, pp. 483–488 (2017)

    Google Scholar 

  21. Price, T.W., Zhi, R., Barnes, T.: Hint generation under uncertainty: the effect of hint quality on help-seeking behavior. In: André, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 311–322. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61425-0_26

    Chapter  Google Scholar 

  22. Rivers, K., Harpstead, E., Koedinger, K.R.: Learning curve analysis for programming: which concepts do students struggle with? In: Proceedings of the 12th Annual International ACM Conference on International Computing Education Research, pp. 143–151 (2016)

    Google Scholar 

  23. Rivers, K., Koedinger, K.R.: Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. Int. J. Artif. Intell. Educ. 27(1), 37–64 (2017)

    Article  Google Scholar 

  24. Stamper, J., Barnes, T.: An unsupervised, frequency-based metric for selecting hints in an MDP-based tutor. In: Proceedings of Educational Data Mining (2009)

    Google Scholar 

  25. Stamper, J., Barnes, T., Croy, M.: Enhancing the automatic generation of hints with expert seeding. Int. J. Artif. Intell. Educ. 21(1–2), 153–167 (2011)

    Google Scholar 

  26. Stamper, J., Eagle, M., Barnes, T., Croy, M.: Experimental evaluation of automatic hint generation for a logic tutor. Int. J. Artif. Intell. Educ. 22(1–2), 3–17 (2013)

    Google Scholar 

  27. VanLehn, K.: The behavior of tutoring systems. Int. J. Artif. Intell. Educ. 16(3), 227–265 (2006)

    Google Scholar 

  28. Wang, K., Lin, B., Rettig, B., Pardi, P., Singh, R.: Data-driven feedback generator for online programing courses. In: Proceedings of the Fourth ACM Conference on Learning@ Scale, pp. 257–260 (2017)

    Google Scholar 

  29. Watson, C., Li, F.W.B., Godwin, J.L.: BlueFix: using crowd-sourced feedback to support programming students in error diagnosis and repair. In: Popescu, E., Li, Q., Klamma, R., Leung, H., Specht, M. (eds.) ICWL 2012. LNCS, vol. 7558, pp. 228–239. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33642-3_25

    Chapter  Google Scholar 

  30. Yudelson, M., Hosseini, R., Vihavainen, A., Brusilovsky, P.: Investigating automated student modeling in a Java MOOC. In: Proceedings of Educational Data Mining, pp. 261–264 (2014)

    Google Scholar 

  31. Zimmerman, K., Rupakheti, C.R.: An automated framework for recommending program elements to novices (n). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 283–288 (2015)

    Google Scholar 

Download references

Acknowledgements

The authors thank Veronica Cateté for her contributions to this research. This work was supported by the National Science Foundation under grant 1623470.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas W. Price .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Price, T.W., Zhi, R., Dong, Y., Lytle, N., Barnes, T. (2018). The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming. In: Penstein Rosé, C., et al. Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science(), vol 10947. Springer, Cham. https://doi.org/10.1007/978-3-319-93843-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93843-1_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93842-4

  • Online ISBN: 978-3-319-93843-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics