The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming

Price, Thomas W.; Zhi, Rui; Dong, Yihuan; Lytle, Nicholas; Barnes, Tiffany

doi:10.1007/978-3-319-93843-1_35

Thomas W. Price ORCID: orcid.org/0000-0001-9375-2292²¹,
Rui Zhi²¹,
Yihuan Dong²¹,
Nicholas Lytle²¹ &
…
Tiffany Barnes²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10947))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

6576 Accesses

Abstract

In the domain of programming, intelligent tutoring systems increasingly employ data-driven methods to automate hint generation. Evaluations of these systems have largely focused on whether they can reliably provide hints for most students, and how much data is needed to do so, rather than how useful the resulting hints are to students. We present a method for evaluating the quality of data-driven hints and how their quality is impacted by the data used to generate them. Using two datasets, we investigate how the quantity of data and the source of data (whether it comes from students or experts) impact one hint generation algorithm. We find that with student training data, hint quality stops improving after 15–20 training solutions and can decrease with additional data. We also find that student data outperforms a single expert solution but that a comprehensive set of expert solutions generally performs best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparison of the Quality of Data-Driven Programming Hint Generation Algorithms

Article 22 May 2019

Hint Generation Under Uncertainty: The Effect of Hint Quality on Help-Seeking Behavior

An Evaluation of Data-Driven Programming Hints in a Classroom Setting

Notes

1.
Full datasets and hint ratings are available at go.ncsu.edu/hint-quality-data.
2.
Over 200 trials, the standard error of the averaged QualityScores was always less than 0.01, and averaged less than 0.0025 across values of i.
3.
Because the OneExpert baseline uses only one training solution, both weighting approaches produce the same results. For simplicity, Fig. 1 shows only voting-based weighting for the AllExpert baseline.

References

Barnes, T., Stamper, J.: Automatic hint generation for logic proof tutoring using historical data. J. Educ. Technol. Soc. 13(1), 3 (2010)
Google Scholar
Chow, S., Yacef, K., Koprinska, I., Curran, J.: Automated data-driven hints for computer programming students. In: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 5–10 (2017)
Google Scholar
Corbett, A.T., Anderson, J.R.: Locus of feedback control in computer-based tutoring: impact on learning rate, achievement and attitudes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 245–252 (2001)
Google Scholar
Eagle, M., Johnson, M., Barnes, T.: Interaction networks: generating high level hints based on network community clustering. In: International Educational Data Mining Society (2012)
Google Scholar
Fossati, D., Di Eugenio, B., Ohlsson, S., Brown, C., Chen, L.: Data driven automatic feedback generation in the ilist intelligent tutoring system. Technol. Inst. Cogn. Learn. 10(1), 5–26 (2015)
Google Scholar
Gross, S., Mokbel, B., Paaßen, B., Hammer, B., Pinkwart, N.: Example-based feedback provision using structured solution spaces. Int. J. Learn. Technol. 10 9(3), 248–280 (2014)
Article Google Scholar
Gupta, R., Pal, S., Kanade, A., Shevade, S.: Deepfix: fixing common C language errors by deep learning. In: AAAI, pp. 1345–1351 (2017)
Google Scholar
Hartmann, B., MacDougall, D., Brandt, J., Klemmer, S.R.: What would other programmers do: suggesting solutions to error messages. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1019–1028 (2010)
Google Scholar
Koedinger, K.R., Baker, R.S., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC DataShop. In: Handbook of Educational Data Mining, p. 43 (2010)
Google Scholar
Koedinger, K.R., Stamper, J.C., McLaughlin, E.A., Nixon, T.: Using data-driven discovery of better student models to improve student learning. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 421–430. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_43
Chapter Google Scholar
Lazar, T., Bratko, I.: Data-driven program synthesis for hint generation in programming tutors. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 306–311. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_38
Chapter Google Scholar
Lazar, T., Možina, M., Bratko, I.: Automatic extraction of AST patterns for debugging student programs. In: André, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 162–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61425-0_14
Chapter Google Scholar
Mitrovic, A.: A knowledge-based teaching system for SQL. In: Proceedings of ED-MEDIA, vol. 98, 1027–1032 (1998)
Google Scholar
Mostafavi, B., Zhou, G., Lynch, C., Chi, M., Barnes, T.: Data-driven worked examples improve retention and completion in a logic tutor. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 726–729. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19773-9_102
Chapter Google Scholar
Peddycord III, B., Hicks, A., Barnes, T.: Generating hints for programming problems using intermediate output. In: Proceedings of Educational Data Mining (2014)
Google Scholar
Perelman, D., Gulwani, S., Grossman, D.: Test-driven synthesis for automated feedback for introductory computer science assignments. In: Proceedings of Data Mining for Educational Assessment and Feedback (ASSESS 2014) (2014)
Google Scholar
Piech, C., Sahami, M., Huang, J., Guibas, L.: Autonomously generating hints by inferring problem solving policies. In: Proceedings of the Second ACM Conference on Learning@ Scale, pp. 195–204 (2015)
Google Scholar
Price, T., Zhi, R., Barnes, T.: Evaluation of a data-driven feedback algorithm for open-ended programming. In: Proceedings of Educational Data Mining (2017)
Google Scholar
Price, T.W., Dong, Y., Barnes, T.: Generating data-driven hints for open-ended programming. In: Proceedings of Educational Data Mining, pp. 191–198 (2016)
Google Scholar
Price, T.W., Dong, Y., Lipovac, D.: iSnap: towards intelligent tutoring in novice programming environments. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, pp. 483–488 (2017)
Google Scholar
Price, T.W., Zhi, R., Barnes, T.: Hint generation under uncertainty: the effect of hint quality on help-seeking behavior. In: André, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 311–322. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61425-0_26
Chapter Google Scholar
Rivers, K., Harpstead, E., Koedinger, K.R.: Learning curve analysis for programming: which concepts do students struggle with? In: Proceedings of the 12th Annual International ACM Conference on International Computing Education Research, pp. 143–151 (2016)
Google Scholar
Rivers, K., Koedinger, K.R.: Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. Int. J. Artif. Intell. Educ. 27(1), 37–64 (2017)
Article Google Scholar
Stamper, J., Barnes, T.: An unsupervised, frequency-based metric for selecting hints in an MDP-based tutor. In: Proceedings of Educational Data Mining (2009)
Google Scholar
Stamper, J., Barnes, T., Croy, M.: Enhancing the automatic generation of hints with expert seeding. Int. J. Artif. Intell. Educ. 21(1–2), 153–167 (2011)
Google Scholar
Stamper, J., Eagle, M., Barnes, T., Croy, M.: Experimental evaluation of automatic hint generation for a logic tutor. Int. J. Artif. Intell. Educ. 22(1–2), 3–17 (2013)
Google Scholar
VanLehn, K.: The behavior of tutoring systems. Int. J. Artif. Intell. Educ. 16(3), 227–265 (2006)
Google Scholar
Wang, K., Lin, B., Rettig, B., Pardi, P., Singh, R.: Data-driven feedback generator for online programing courses. In: Proceedings of the Fourth ACM Conference on Learning@ Scale, pp. 257–260 (2017)
Google Scholar
Watson, C., Li, F.W.B., Godwin, J.L.: BlueFix: using crowd-sourced feedback to support programming students in error diagnosis and repair. In: Popescu, E., Li, Q., Klamma, R., Leung, H., Specht, M. (eds.) ICWL 2012. LNCS, vol. 7558, pp. 228–239. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33642-3_25
Chapter Google Scholar
Yudelson, M., Hosseini, R., Vihavainen, A., Brusilovsky, P.: Investigating automated student modeling in a Java MOOC. In: Proceedings of Educational Data Mining, pp. 261–264 (2014)
Google Scholar
Zimmerman, K., Rupakheti, C.R.: An automated framework for recommending program elements to novices (n). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 283–288 (2015)
Google Scholar

Download references

Acknowledgements

The authors thank Veronica Cateté for her contributions to this research. This work was supported by the National Science Foundation under grant 1623470.

Author information

Authors and Affiliations

North Carolina State University, Raleigh, NC, 27606, USA
Thomas W. Price, Rui Zhi, Yihuan Dong, Nicholas Lytle & Tiffany Barnes

Authors

Thomas W. Price
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhi
View author publications
You can also search for this author in PubMed Google Scholar
Yihuan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Lytle
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany Barnes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas W. Price .

Editor information

Editors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Carolyn Penstein Rosé
University of Technology, Sydney, NSW, Australia
Roberto Martínez-Maldonado
University of Duisburg-Essen, Duisburg, Germany
H. Ulrich Hoppe
UCL Institute of Education, London, UK
Rose Luckin
UCL Institute of Education, London, UK
Manolis Mavrikis
UCL Institute of Education, London, UK
Kaska Porayska-Pomsta
Carnegie Mellon University, Pittsburgh, PA, USA
Bruce McLaren
University of Sussex, Brighton, UK
Benedict du Boulay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Price, T.W., Zhi, R., Dong, Y., Lytle, N., Barnes, T. (2018). The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming. In: Penstein Rosé, C., et al. Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science(), vol 10947. Springer, Cham. https://doi.org/10.1007/978-3-319-93843-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-93843-1_35
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93842-4
Online ISBN: 978-3-319-93843-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of the Quality of Data-Driven Programming Hint Generation Algorithms

Hint Generation Under Uncertainty: The Effect of Hint Quality on Help-Seeking Behavior

An Evaluation of Data-Driven Programming Hints in a Classroom Setting

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of the Quality of Data-Driven Programming Hint Generation Algorithms

Hint Generation Under Uncertainty: The Effect of Hint Quality on Help-Seeking Behavior

An Evaluation of Data-Driven Programming Hints in a Classroom Setting

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation