ABSTRACT
Measuring algorithmic bias in machine learning has historically focused on statistical inequalities pertaining to specific groups. However, the most common metrics (i.e., those focused on individual- or group-conditioned error rates) are not currently well-suited to educational settings because they assume that each individual observation is independent from the others. This is not statistically appropriate when studying certain common educational outcomes, because such metrics cannot account for the relationship between students in classrooms or multiple observations per student across an academic year. In this paper, we present novel adaptations of algorithmic bias measurements for regression for both independent and nested data structures. Using hierarchical linear models, we rigorously measure algorithmic bias in a machine learning model of the relationship between student engagement in an intelligent tutoring system and year-end standardized test scores. We conclude that classroom-level influences had a small but significant effect on models. Examining significance with hierarchical linear models helps determine which inequalities in educational settings might be explained by small sample sizes rather than systematic differences.
- Vincent Aleven and Kenneth R Koedinger. 2001. Investigations into help seeking and learning with a cognitive tutor. In Papers of the AIED-2001 Workshop on Help Provision and Help Seeking in Interactive Learning Environments. Springer Cham, San Antonio, TX, 47–58.Google Scholar
- Husni Almoubayyed, Stephen E. Fancsali, and Steve Ritter. 2023. Instruction-embedded assessment for reading ability in adaptive mathematics software. In LAK23: 13th International Learning Analytics and Knowledge Conference. ACM, Arlington TX USA, 366–377. https://doi.org/10.1145/3576050.3576105Google ScholarDigital Library
- Joshi Ambarish, Stephen E. Fancsali, Steven Ritter, Tristan Nixon, and Susan R. Berman. 2014. Generalizing and extending a predictive model for standardized test scores based on Cognitive Tutor interactions. In Proceedings of the 7th International Conference on Educational Data Mining. Educational Data Mining Society (IEDMS), Online, 369–370.Google Scholar
- Ryan Baker and Adriana de Carvalho. 2008. Labeling student behavior faster and more precisely with text replays. In Proceedings of the 1st International Conference on Educational Data Mining. Educational Data Mining Society (IEDMS), Montréal, Canada, 38–47.Google Scholar
- Ryan S. Baker. 2023. Big Data and Education. 7th Edition.Google Scholar
- Ryan Shaun Baker, Albert T. Corbett, Kenneth R. Koedinger, and Angela Z. Wagner. 2004. Off-Task behavior in the Cognitive Tutor classroom: When students “Game the System”. In Proceedings of the SIGCHI conference on Human factors in computing systems, Vol. 6. Association for Computing Machinery, Vienna, Austria, 383–390.Google ScholarDigital Library
- Ryan S. Baker and Aaron Hawn. 2021. Algorithmic bias in education. International Journal of Artificial Intelligence in Education 32 (Nov. 2021), 1052–1092. https://doi.org/10.1007/s40593-021-00285-9Google ScholarCross Ref
- Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org, Online. http://www.fairmlbook.orgGoogle Scholar
- Clara Belitz, Lan Jiang, and Nigel Bosch. 2021. Automating procedurally fair feature selection in machine learning. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, New York, NY, 379–389. https://doi.org/10.1145/3461702.3462585Google ScholarDigital Library
- Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2021. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 1 (2021), 3–44. https://doi.org/10.1177/0049124118782533Google ScholarCross Ref
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (June 2002), 321–357. https://doi.org/10.1613/jair.953Google ScholarCross Ref
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, San Francisco, CA, USA, 785–794. https://doi.org/10.1145/2939672.2939785Google ScholarDigital Library
- Albert T. Corbett and John R. Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4 (Dec. 1994), 253–278. https://doi.org/10.1007/BF01099821 Company: Springer Distributor: Springer Institution: Springer Label: Springer Number: 4 Publisher: Kluwer Academic Publishers.Google ScholarCross Ref
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference(ITCS ’12). Association for Computing Machinery, New York, NY, USA, 214–226. https://doi.org/10.1145/2090236.2090255Google ScholarDigital Library
- Stephen E. Fancsali. 2014. Causal discovery with models: Behavior, affect, and learning in Cognitive Tutor Algebra. In Proceedings of the 7th International Conference on Educational Data Mining. Educational Data Mining Society (IEDMS), Online, 28–35.Google Scholar
- Mingyu Feng, Neil Heffernan, and Kenneth Koedinger. 2009. Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction 19, 3 (Aug. 2009), 243–266. https://doi.org/10.1007/s11257-009-9063-7Google ScholarDigital Library
- Sorelle A. Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2021. The (Im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Commun. ACM 64, 4 (April 2021), 136–143. https://doi.org/10.1145/3433949Google ScholarDigital Library
- Josh Gardner, Christopher Brooks, and Ryan Baker. 2019. Evaluating the fairness of predictive student models through slicing analysis. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge(LAK19). Association for Computing Machinery, New York, NY, USA, 225–234. https://doi.org/10.1145/3303772.3303791Google ScholarDigital Library
- Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (April 2006), 3–42. https://doi.org/10.1007/s10994-006-6226-1Google ScholarDigital Library
- Neil T. Heffernan and Cristina Lindquist Heffernan. 2014. The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education 24, 4 (Dec. 2014), 470–497. https://doi.org/10.1007/s40593-014-0024-xGoogle ScholarCross Ref
- Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (Oct. 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8Google ScholarDigital Library
- Kenneth R. Koedinger, Elizabeth A. McLaughlin, and Neil T. Heffernan. 2010. A quasi-experimental evaluation of an on-line formative assessment and tutoring system. Journal of Educational Computing Research 43, 4 (Dec. 2010), 489–510. https://doi.org/10.2190/EC.43.4.d Publisher: SAGE Publications Inc.Google ScholarCross Ref
- James A. Kulik and J.D. Fletcher. 2016. Effectiveness of intelligent tutoring systems: A meta-analytic review. Review of Educational Research 86, 1 (March 2016), 42–78. http://journals.sagepub.com/doi/full/10.3102/0034654315581420Google ScholarCross Ref
- Nathan Levin, Ryan S. Baker, Nidhi Nasiar, Stephen Fancsali, and Stephen Hutt. 2022. Evaluating gaming detector model robustness over time. In Proceedings of the 15th International Conference on Educational Data Mining. International Educational Data Mining Society, Durham, UK, 398–405. https://doi.org/10.5281/ZENODO.6852962 Publisher: Zenodo.Google ScholarCross Ref
- Kristian Lum, Yunfeng Zhang, and Amanda Bower. 2022. De-biasing “bias” measurement. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 379–389. https://doi.org/10.1145/3531146.3533105Google ScholarDigital Library
- Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. Comput. Surveys 54, 6 (July 2021), 115:1–115:35. https://doi.org/10.1145/3457607Google ScholarDigital Library
- Roger Nkambou, Jacqueline Bourdeau, Riichiro Mizoguchi, and Janusz Kacprzyk (Eds.). 2010. Advances in Intelligent Tutoring Systems. Studies in Computational Intelligence, Vol. 308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14363-2Google ScholarCross Ref
- Luc Paquette and Ryan S. Baker. 2019. Comparing machine learning to knowledge engineering for student behavior modeling: A case study in gaming the system. Interactive Learning Environments 27, 5-6 (Aug. 2019), 585–597. https://doi.org/10.1080/10494820.2019.1610450Google ScholarCross Ref
- Zachary A. Pardos, Ryan S. J. D. Baker, Maria O. C. Z. San Pedro, Sujith M. Gowda, and Supreeth M. Gowda. 2014. Affective states and state tests: Investigating how affect and engagement during the school year predict end-of-year learning outcomes. Journal of Learning Analytics 1, 1 (2014), 107–128. https://eric.ed.gov/?id=EJ1127034 Publisher: Society for Learning Analytics Research ERIC Number: EJ1127034.Google ScholarCross Ref
- Zachary A. Pardos, Qing Yang Wang, and Shubhendu Trivedi. 2012. The real world significance of performance prediction. In Proceedings of the 5th International Conference on Educational Data Mining. International Educational Data Mining Society, Chania, Greece, 192–195. https://eric.ed.gov/?id=ED537229 Publication Title: International Educational Data Mining Society ERIC Number: ED537229.Google Scholar
- Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, and David Cournapeau. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830.Google ScholarDigital Library
- Steven Ritter, John R. Anderson, Kenneth R. Koedinger, and Albert Corbett. 2007. Cognitive Tutor: Applied research in mathematics education. Psychonomic Bulletin & Review 14, 2 (April 2007), 249–255. https://doi.org/10.3758/BF03194060Google ScholarCross Ref
- Steven Ritter and Stephen E. Fancsali. 2016. MATHia X: The next generation Cognitive Tutor. In Proceedings of the EDM 2016 Workshops and Tutorials. Raleigh, North Carolina, 624–625.Google Scholar
- Maria O. C. Z. San Pedro, Jaclyn L. Ocumpaugh, Ryan S. Baker, and Neil T. Heffernan. 2014. Predicting STEM and non-STEM college major enrollment from middle school interaction with mathematics educational software. In Proceedings of the 7th International Conference on Educational Data Mining. International Educational Data Mining Society, Online, 276–279.Google Scholar
- Tom A.B. Snijders and Roel Bosker. 2012. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling (2nd edition ed.). Sage, Thousand Oaks, CA.Google Scholar
- Kurt VanLehn. 2011. The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist 46, 4 (2011), 197–221. http://www.tandfonline.com/doi/abs/10.1080/00461520.2011.611369Google ScholarCross Ref
- William J. Webster, Robert L. Mendro, Timothy H. Orsak, and Dash Weerasinghe. 1998. An application of hierarchical linear modeling to the estimation of school and teacher effect. In 1998 Annual Meeting Program. American Educational Research Association, San Diego, CA, 33. https://eric.ed.gov/?id=ED424300 ERIC Number: ED424300.Google Scholar
Index Terms
- Hierarchical Dependencies in Classroom Settings Influence Algorithmic Bias Metrics
Recommendations
Personalized E-learning system with self-regulated learning assisted mechanisms for promoting learning performance
With the rapid development of Internet technologies, the conventional computer-assisted learning (CAL) is gradually moving toward to web-based learning. Additionally, instructors typically base their teaching methods to simultaneously interact with all ...
Effects of interactivity and instructional scaffolding on learning: Self-regulation in online video-based environments
Online learning often requires learners to be self-directed and engaged. The present study examined students' self-regulatory behaviors in online video-based learning environments. Using an experimental design, this study investigated the effects of a ...
The evaluation of Learning Management Systems using an artificial intelligence fuzzy logic algorithm
There are many open source and commercially available Learning Management System (LMS) on the Internet and one of the important problems in this field is how to choose an LMS that will be the most effective one and that will satisfy the requirements. In ...
Comments