skip to main content
10.1145/3576050.3576116acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article

The Student Zipf Theory: Inferring Latent Structures in Open-Ended Student Work To Help Educators

Published:13 March 2023Publication History

ABSTRACT

Are there structures underlying student work that are universal across every open-ended task? We demonstrate that, across many subjects and assignment types, the probability distribution underlying student-generated open-ended work is close to Zipf’s Law. Inferring this latent structure for classroom assignments can help learning analytics researchers, instruction designers, and educators understand the landscape of various student approaches, assess the complexity of assignments, and prioritise pedagogical attention. However, typical classrooms are way too small to witness even the contour of the Zipfian pattern, and it is generally impossible to perform inference for Zipf’s law from such small number of samples. We formalise this difficult task as the Zipf Inference Challenge: (1) Infer the ordering of student-generated works by their underlying probabilities, and (2) Estimate the shape parameter of the underlying distribution in a typical-sized classroom. Our key insight in addressing this challenge is to leverage the densities of the student response landscapes represented by semantic similarity. We show that our “Semantic Density Estimation” method is able to do a much better job at inferring the latent Zipf shape and the probability-ordering of student responses for real world education datasets.

References

  1. Laurence Aitchison, Nicola Corradi, and Peter E Latham. 2016. Zipf’s law arises naturally when there are underlying, unobserved variables. PLoS computational biology 12, 12 (2016), e1005110.Google ScholarGoogle Scholar
  2. Gökhan Akçapınar, Mohammad Nehal Hasnine, Rwitajit Majumdar, Brendan Flanagan, and Hiroaki Ogata. 2019. Developing an early-warning system for spotting at-risk students by using eBook interaction logs. Smart Learning Environments 6, 1 (2019), 1–15.Google ScholarGoogle ScholarCross RefCross Ref
  3. Albert-László Barabási, Réka Albert, and Hawoong Jeong. 1999. Mean-field theory for scale-free random networks. Physica A: Statistical Mechanics and its Applications 272, 1-2(1999), 173–187.Google ScholarGoogle Scholar
  4. Sumit Basu, Chuck Jacobs, and Lucy Vanderwende. 2013. Powergrading: a clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics 1 (2013), 391–402.Google ScholarGoogle ScholarCross RefCross Ref
  5. Menucha Birenbaum and Kikumi K Tatsuoka. 1987. Open-ended versus multiple-choice response formats—it does make a difference for diagnostic purposes. Applied Psychological Measurement 11, 4 (1987), 385–395.Google ScholarGoogle ScholarCross RefCross Ref
  6. Vladimir V Bochkarev and Eduard Yu Lerner. 2012. Zipf and non-Zipf laws for homogeneous Markov chain. arXiv preprint arXiv:1207.1872(2012).Google ScholarGoogle Scholar
  7. Vladimir V Bochkarev and Eduard Yu Lerner. 2016. The exact power law and Pascal pyramid. arXiv preprint arXiv:1605.09052(2016).Google ScholarGoogle Scholar
  8. Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Computer networks 33, 1-6 (2000), 309–320.Google ScholarGoogle Scholar
  9. Michael Brooks, Sumit Basu, Charles Jacobs, and Lucy Vanderwende. 2014. Divide and correct: using clusters to grade short answers at scale. In Proceedings of the first ACM conference on Learning@ scale conference. 89–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John Seely Brown and Kurt VanLehn. 1980. Repair theory: A generative theory of bugs in procedural skills. Cognitive science 4, 4 (1980), 379–426.Google ScholarGoogle Scholar
  11. David G Champernowne. 1953. A model of income distribution. The Economic Journal 63, 250 (1953), 318–351.Google ScholarGoogle ScholarCross RefCross Ref
  12. Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM review 51, 4 (2009), 661–703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brian Conrad and Michael Mitzenmacher. 2004. Power laws for monkeys typing randomly: the case of unequal probabilities. IEEE Transactions on information theory 50, 7 (2004), 1403–1414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Evandro B Costa, Baldoino Fonseca, Marcelo Almeida Santana, Fabrísia Ferreira de Araújo, and Joilson Rego. 2017. Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in human behavior 73 (2017), 247–256.Google ScholarGoogle Scholar
  15. Anna Deluca and Álvaro Corral. 2013. Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions. Acta Geophysica 61, 6 (2013), 1351–1394.Google ScholarGoogle ScholarCross RefCross Ref
  16. John A Erickson, Anthony F Botelho, Steven McAteer, Ashvini Varatharaj, and Neil T Heffernan. 2020. The automated grading of student open responses in mathematics. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge. 615–624.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547.Google ScholarGoogle ScholarCross RefCross Ref
  18. Linda Flower and John R Hayes. 1981. A cognitive process theory of writing. College composition and communication 32, 4 (1981), 365–387.Google ScholarGoogle ScholarCross RefCross Ref
  19. Sonja Johnson-Yu, Nicholas Bowman, Mehran Sahami, and Chris Piech. [n. d.]. SimGrade: Using Code Similarity Measures for More Accurate Human Grading. ([n. d.]).Google ScholarGoogle Scholar
  20. William L Kuechler and Mark G Simkin. 2010. Why is performance on multiple-choice tests and constructed-response tests not more closely related? Theory and an empirical test. Decision Sciences Journal of Innovative Education 8, 1 (2010), 55–73.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D Sivakumar, Andrew Tomkins, and Eli Upfal. 2000. Stochastic models for the web graph. In Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE, 57–65.Google ScholarGoogle ScholarCross RefCross Ref
  22. Andrew S Lan, Divyanshu Vats, Andrew E Waters, and Richard G Baraniuk. 2015. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Proceedings of the second (2015) ACM conference on learning@ scale. 167–176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Claudia Leacock and Martin Chodorow. 2003. C-rater: Automated scoring of short-answer questions. Computers and the Humanities 37, 4 (2003), 389–405.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ali Malik, Mike Wu, Vrinda Vasavada, Jinpeng Song, John Mitchell, Noah Goodman, and Chris Piech. 2019. Generative Grading: Neural Approximate Parsing for Automated Student Feedback. arXiv preprint arXiv:1905.09916(2019).Google ScholarGoogle Scholar
  25. Benoit B Mandelbrot. 2013. Fractals and scaling in finance: Discontinuity, concentration, risk. Selecta volume E. Springer Science & Business Media.Google ScholarGoogle Scholar
  26. Farshid Marbouti, Heidi A Diefes-Dux, and Krishna Madhavan. 2016. Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education 103 (2016), 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. John Mason. 2002. Researching your own practice: The discipline of noticing. Routledge.Google ScholarGoogle Scholar
  28. Agathe Merceron and Kalina Yacef. 2004. Clustering students to help evaluate learning. In IFIP World Computer Congress, TC 3. Springer, 31–42.Google ScholarGoogle Scholar
  29. Vera L Miguéis, Ana Freitas, Paulo JV Garcia, and André Silva. 2018. Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems 115 (2018), 36–51.Google ScholarGoogle ScholarCross RefCross Ref
  30. George A Miller. 1957. Some effects of intermittent silence. The American journal of psychology 70, 2 (1957), 311–314.Google ScholarGoogle Scholar
  31. Michael Mitzenmacher. 2004. A brief history of generative models for power law and lognormal distributions. Internet mathematics 1, 2 (2004), 226–251.Google ScholarGoogle Scholar
  32. Thierry Mora and William Bialek. 2011. Are biological systems poised at criticality?Journal of Statistical Physics 144, 2 (2011), 268–302.Google ScholarGoogle Scholar
  33. Andy Nguyen, Christopher Piech, Jonathan Huang, and Leonidas Guibas. 2014. Codewebs: scalable homework search for massive open online programming courses. In Proceedings of the 23rd international conference on World wide web. 491–502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. National Council of Teachers of Mathematics. 2014. Principles to Actions: Ensuring Mathematical Success for All, Author.Google ScholarGoogle Scholar
  35. Christopher Piech, Ali Malik, Kylie Jue, and Mehran Sahami. 2021. Code in Place: Online Section Leading for Scalable Human-Centered Learning. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 973–979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chris Piech, Mehran Sahami, Daphne Koller, Steve Cooper, and Paulo Blikstein. 2012. Modeling how students learn to program. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 153–160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Christopher James Piech. 2016. Uncovering patterns in student work: Machine learning to understand human learning. Stanford University.Google ScholarGoogle Scholar
  38. Charlie Pilgrim and Thomas T Hills. 2021. Bias in Zipf’s law estimators. Scientific reports 11, 1 (2021), 1–11.Google ScholarGoogle Scholar
  39. Parikshit Ram and Alexander G Gray. 2011. Density estimation trees. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 627–635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Brian Riordan, Andrea Horbach, Aoife Cahill, Torsten Zesch, and Chungmin Lee. 2017. Investigating neural architectures for short answer scoring. In Proceedings of the 12th workshop on innovative use of NLP for building educational applications. 159–168.Google ScholarGoogle ScholarCross RefCross Ref
  41. Kelly Rivers and Kenneth R Koedinger. 2014. Automating hint generation with solution space path construction. In International Conference on Intelligent Tutoring Systems. Springer, 329–339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dale H Schunk. 2012. Learning theories an educational perspective sixth edition. pearson.Google ScholarGoogle Scholar
  43. David J Schwab, Ilya Nemenman, and Pankaj Mehta. 2014. Zipf’s law and criticality in multivariate data without fine-tuning. Physical review letters 113, 6 (2014), 068102.Google ScholarGoogle Scholar
  44. Mark D Shermis and Jill Burstein. 2013. Handbook of automated essay evaluation. NY: Routledge (2013).Google ScholarGoogle ScholarCross RefCross Ref
  45. Mark D Shermis and Ben Hamner. 2012. Contrasting state-of-the-art automated scoring of essays: Analysis. In Annual national council on measurement in education meeting. National Council on Measurement in Education Vancouver, BC, Canada, 14–16.Google ScholarGoogle Scholar
  46. Bernard W Silverman. 1986. Density Estimation for Statistics and Data Analysis. Vol. 26. CRC Press.Google ScholarGoogle Scholar
  47. Herbert A Simon. 1955. On a class of skew distribution functions. Biometrika 42, 3/4 (1955), 425–440.Google ScholarGoogle ScholarCross RefCross Ref
  48. Arjun Singh, Sergey Karayev, Kevin Gutowski, and Pieter Abbeel. 2017. Gradescope: a fast, flexible, and fair system for scalable assessment of handwritten work. In Proceedings of the fourth (2017) acm conference on learning@ scale. 81–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Margaret S Smith and Mary Kay Stein. 2018. 5 Practices for Orchestrating Productive Mathematics Discussions. In 5 Practices for Orchestrating Productive Mathematics Discussions. The National Council of Teachers of Mathematics, Inc.Google ScholarGoogle Scholar
  50. Kurt VanLehn. 1982. Bugs are not enough: Empirical studies of bugs, impasses and repairs in procedural skills.The Journal of Mathematical Behavior(1982).Google ScholarGoogle Scholar
  51. Carlos J Villagrá-Arnedo, Francisco J Gallego-Durán, Faraón Llorens-Largo, Patricia Compañ-Rosique, Rosana Satorre-Cuerda, and Rafael Molina-Carmona. 2017. Improving the expressiveness of black-box models for predicting student performance. Computers in Human Behavior 72 (2017), 621–631.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hajra Waheed, Saeed-Ul Hassan, Naif Radi Aljohani, Julie Hardman, Salem Alelyani, and Raheel Nawaz. 2020. Predicting academic performance of students from VLE big data using deep learning models. Computers in Human behavior 104 (2020), 106189.Google ScholarGoogle Scholar
  53. Mike Wu, Milan Mosse, Noah Goodman, and Chris Piech. 2019. Zero shot learning for code education: Rubric sampling with deep learning inference. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 782–790.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. George Udny Yule. 1925. II.—A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FR S. Philosophical transactions of the Royal Society of London. Series B, containing papers of a biological character 213, 402-410 (1925), 21–87.Google ScholarGoogle Scholar

Index Terms

  1. The Student Zipf Theory: Inferring Latent Structures in Open-Ended Student Work To Help Educators

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        LAK2023: LAK23: 13th International Learning Analytics and Knowledge Conference
        March 2023
        692 pages
        ISBN:9781450398657
        DOI:10.1145/3576050

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate236of782submissions,30%
      • Article Metrics

        • Downloads (Last 12 months)113
        • Downloads (Last 6 weeks)5

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format