ABSTRACT
Are there structures underlying student work that are universal across every open-ended task? We demonstrate that, across many subjects and assignment types, the probability distribution underlying student-generated open-ended work is close to Zipf’s Law. Inferring this latent structure for classroom assignments can help learning analytics researchers, instruction designers, and educators understand the landscape of various student approaches, assess the complexity of assignments, and prioritise pedagogical attention. However, typical classrooms are way too small to witness even the contour of the Zipfian pattern, and it is generally impossible to perform inference for Zipf’s law from such small number of samples. We formalise this difficult task as the Zipf Inference Challenge: (1) Infer the ordering of student-generated works by their underlying probabilities, and (2) Estimate the shape parameter of the underlying distribution in a typical-sized classroom. Our key insight in addressing this challenge is to leverage the densities of the student response landscapes represented by semantic similarity. We show that our “Semantic Density Estimation” method is able to do a much better job at inferring the latent Zipf shape and the probability-ordering of student responses for real world education datasets.
- Laurence Aitchison, Nicola Corradi, and Peter E Latham. 2016. Zipf’s law arises naturally when there are underlying, unobserved variables. PLoS computational biology 12, 12 (2016), e1005110.Google Scholar
- Gökhan Akçapınar, Mohammad Nehal Hasnine, Rwitajit Majumdar, Brendan Flanagan, and Hiroaki Ogata. 2019. Developing an early-warning system for spotting at-risk students by using eBook interaction logs. Smart Learning Environments 6, 1 (2019), 1–15.Google ScholarCross Ref
- Albert-László Barabási, Réka Albert, and Hawoong Jeong. 1999. Mean-field theory for scale-free random networks. Physica A: Statistical Mechanics and its Applications 272, 1-2(1999), 173–187.Google Scholar
- Sumit Basu, Chuck Jacobs, and Lucy Vanderwende. 2013. Powergrading: a clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics 1 (2013), 391–402.Google ScholarCross Ref
- Menucha Birenbaum and Kikumi K Tatsuoka. 1987. Open-ended versus multiple-choice response formats—it does make a difference for diagnostic purposes. Applied Psychological Measurement 11, 4 (1987), 385–395.Google ScholarCross Ref
- Vladimir V Bochkarev and Eduard Yu Lerner. 2012. Zipf and non-Zipf laws for homogeneous Markov chain. arXiv preprint arXiv:1207.1872(2012).Google Scholar
- Vladimir V Bochkarev and Eduard Yu Lerner. 2016. The exact power law and Pascal pyramid. arXiv preprint arXiv:1605.09052(2016).Google Scholar
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Computer networks 33, 1-6 (2000), 309–320.Google Scholar
- Michael Brooks, Sumit Basu, Charles Jacobs, and Lucy Vanderwende. 2014. Divide and correct: using clusters to grade short answers at scale. In Proceedings of the first ACM conference on Learning@ scale conference. 89–98.Google ScholarDigital Library
- John Seely Brown and Kurt VanLehn. 1980. Repair theory: A generative theory of bugs in procedural skills. Cognitive science 4, 4 (1980), 379–426.Google Scholar
- David G Champernowne. 1953. A model of income distribution. The Economic Journal 63, 250 (1953), 318–351.Google ScholarCross Ref
- Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM review 51, 4 (2009), 661–703.Google ScholarDigital Library
- Brian Conrad and Michael Mitzenmacher. 2004. Power laws for monkeys typing randomly: the case of unequal probabilities. IEEE Transactions on information theory 50, 7 (2004), 1403–1414.Google ScholarDigital Library
- Evandro B Costa, Baldoino Fonseca, Marcelo Almeida Santana, Fabrísia Ferreira de Araújo, and Joilson Rego. 2017. Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in human behavior 73 (2017), 247–256.Google Scholar
- Anna Deluca and Álvaro Corral. 2013. Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions. Acta Geophysica 61, 6 (2013), 1351–1394.Google ScholarCross Ref
- John A Erickson, Anthony F Botelho, Steven McAteer, Ashvini Varatharaj, and Neil T Heffernan. 2020. The automated grading of student open responses in mathematics. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge. 615–624.Google ScholarDigital Library
- Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547.Google ScholarCross Ref
- Linda Flower and John R Hayes. 1981. A cognitive process theory of writing. College composition and communication 32, 4 (1981), 365–387.Google ScholarCross Ref
- Sonja Johnson-Yu, Nicholas Bowman, Mehran Sahami, and Chris Piech. [n. d.]. SimGrade: Using Code Similarity Measures for More Accurate Human Grading. ([n. d.]).Google Scholar
- William L Kuechler and Mark G Simkin. 2010. Why is performance on multiple-choice tests and constructed-response tests not more closely related? Theory and an empirical test. Decision Sciences Journal of Innovative Education 8, 1 (2010), 55–73.Google ScholarCross Ref
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D Sivakumar, Andrew Tomkins, and Eli Upfal. 2000. Stochastic models for the web graph. In Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE, 57–65.Google ScholarCross Ref
- Andrew S Lan, Divyanshu Vats, Andrew E Waters, and Richard G Baraniuk. 2015. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Proceedings of the second (2015) ACM conference on learning@ scale. 167–176.Google ScholarDigital Library
- Claudia Leacock and Martin Chodorow. 2003. C-rater: Automated scoring of short-answer questions. Computers and the Humanities 37, 4 (2003), 389–405.Google ScholarCross Ref
- Ali Malik, Mike Wu, Vrinda Vasavada, Jinpeng Song, John Mitchell, Noah Goodman, and Chris Piech. 2019. Generative Grading: Neural Approximate Parsing for Automated Student Feedback. arXiv preprint arXiv:1905.09916(2019).Google Scholar
- Benoit B Mandelbrot. 2013. Fractals and scaling in finance: Discontinuity, concentration, risk. Selecta volume E. Springer Science & Business Media.Google Scholar
- Farshid Marbouti, Heidi A Diefes-Dux, and Krishna Madhavan. 2016. Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education 103 (2016), 1–15.Google ScholarDigital Library
- John Mason. 2002. Researching your own practice: The discipline of noticing. Routledge.Google Scholar
- Agathe Merceron and Kalina Yacef. 2004. Clustering students to help evaluate learning. In IFIP World Computer Congress, TC 3. Springer, 31–42.Google Scholar
- Vera L Miguéis, Ana Freitas, Paulo JV Garcia, and André Silva. 2018. Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems 115 (2018), 36–51.Google ScholarCross Ref
- George A Miller. 1957. Some effects of intermittent silence. The American journal of psychology 70, 2 (1957), 311–314.Google Scholar
- Michael Mitzenmacher. 2004. A brief history of generative models for power law and lognormal distributions. Internet mathematics 1, 2 (2004), 226–251.Google Scholar
- Thierry Mora and William Bialek. 2011. Are biological systems poised at criticality?Journal of Statistical Physics 144, 2 (2011), 268–302.Google Scholar
- Andy Nguyen, Christopher Piech, Jonathan Huang, and Leonidas Guibas. 2014. Codewebs: scalable homework search for massive open online programming courses. In Proceedings of the 23rd international conference on World wide web. 491–502.Google ScholarDigital Library
- National Council of Teachers of Mathematics. 2014. Principles to Actions: Ensuring Mathematical Success for All, Author.Google Scholar
- Christopher Piech, Ali Malik, Kylie Jue, and Mehran Sahami. 2021. Code in Place: Online Section Leading for Scalable Human-Centered Learning. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 973–979.Google ScholarDigital Library
- Chris Piech, Mehran Sahami, Daphne Koller, Steve Cooper, and Paulo Blikstein. 2012. Modeling how students learn to program. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 153–160.Google ScholarDigital Library
- Christopher James Piech. 2016. Uncovering patterns in student work: Machine learning to understand human learning. Stanford University.Google Scholar
- Charlie Pilgrim and Thomas T Hills. 2021. Bias in Zipf’s law estimators. Scientific reports 11, 1 (2021), 1–11.Google Scholar
- Parikshit Ram and Alexander G Gray. 2011. Density estimation trees. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 627–635.Google ScholarDigital Library
- Brian Riordan, Andrea Horbach, Aoife Cahill, Torsten Zesch, and Chungmin Lee. 2017. Investigating neural architectures for short answer scoring. In Proceedings of the 12th workshop on innovative use of NLP for building educational applications. 159–168.Google ScholarCross Ref
- Kelly Rivers and Kenneth R Koedinger. 2014. Automating hint generation with solution space path construction. In International Conference on Intelligent Tutoring Systems. Springer, 329–339.Google ScholarDigital Library
- Dale H Schunk. 2012. Learning theories an educational perspective sixth edition. pearson.Google Scholar
- David J Schwab, Ilya Nemenman, and Pankaj Mehta. 2014. Zipf’s law and criticality in multivariate data without fine-tuning. Physical review letters 113, 6 (2014), 068102.Google Scholar
- Mark D Shermis and Jill Burstein. 2013. Handbook of automated essay evaluation. NY: Routledge (2013).Google ScholarCross Ref
- Mark D Shermis and Ben Hamner. 2012. Contrasting state-of-the-art automated scoring of essays: Analysis. In Annual national council on measurement in education meeting. National Council on Measurement in Education Vancouver, BC, Canada, 14–16.Google Scholar
- Bernard W Silverman. 1986. Density Estimation for Statistics and Data Analysis. Vol. 26. CRC Press.Google Scholar
- Herbert A Simon. 1955. On a class of skew distribution functions. Biometrika 42, 3/4 (1955), 425–440.Google ScholarCross Ref
- Arjun Singh, Sergey Karayev, Kevin Gutowski, and Pieter Abbeel. 2017. Gradescope: a fast, flexible, and fair system for scalable assessment of handwritten work. In Proceedings of the fourth (2017) acm conference on learning@ scale. 81–88.Google ScholarDigital Library
- Margaret S Smith and Mary Kay Stein. 2018. 5 Practices for Orchestrating Productive Mathematics Discussions. In 5 Practices for Orchestrating Productive Mathematics Discussions. The National Council of Teachers of Mathematics, Inc.Google Scholar
- Kurt VanLehn. 1982. Bugs are not enough: Empirical studies of bugs, impasses and repairs in procedural skills.The Journal of Mathematical Behavior(1982).Google Scholar
- Carlos J Villagrá-Arnedo, Francisco J Gallego-Durán, Faraón Llorens-Largo, Patricia Compañ-Rosique, Rosana Satorre-Cuerda, and Rafael Molina-Carmona. 2017. Improving the expressiveness of black-box models for predicting student performance. Computers in Human Behavior 72 (2017), 621–631.Google ScholarDigital Library
- Hajra Waheed, Saeed-Ul Hassan, Naif Radi Aljohani, Julie Hardman, Salem Alelyani, and Raheel Nawaz. 2020. Predicting academic performance of students from VLE big data using deep learning models. Computers in Human behavior 104 (2020), 106189.Google Scholar
- Mike Wu, Milan Mosse, Noah Goodman, and Chris Piech. 2019. Zero shot learning for code education: Rubric sampling with deep learning inference. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 782–790.Google ScholarDigital Library
- George Udny Yule. 1925. II.—A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FR S. Philosophical transactions of the Royal Society of London. Series B, containing papers of a biological character 213, 402-410 (1925), 21–87.Google Scholar
Index Terms
- The Student Zipf Theory: Inferring Latent Structures in Open-Ended Student Work To Help Educators
Recommendations
Student engagement in massive open online courses
Completion rates in massive open online courses MOOCs are disturbingly low. Existing analysis has focused on patterns of resource access and prediction of drop-out using learning analytics. In contrast, the effectiveness of teaching programs in ...
Inferring Student Learning Behaviour from Website Interactions: A Usage Analysis
Web-based learning environments are now used extensively as integral components of course delivery in tertiary education. To provide an effective learning environment, it is important that educators understand how these environments are used by their ...
Modeling Student Learning Styles in MOOCs
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementThe recorded student activities in Massive Open Online Course (MOOC) provide us a unique opportunity to model their learning behaviors, identify their particular learning intents, and enable personalized assistance and guidance in online education. In ...
Comments