skip to main content
10.1145/3430895.3460141acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesl-at-sConference Proceedingsconference-collections
research-article
Public Access

Domain Experts' Interpretations of Assessment Bias in a Scaled, Online Computer Science Curriculum

Published: 08 June 2021 Publication History

Abstract

Understanding inequity at scale is necessary for designing equitable online learning experiences, but also difficult. Statistical techniques like differential item functioning (DIF) can help identify whether items/questions in an assessment exhibit potential bias by disadvantaging certain groups (e.g. whether item disadvantages woman vs man of equivalent knowledge). While testing companies typically use DIF to identify items to remove, we explored how domain-experts such as curriculum designers could use DIF to better understand how to design instructional materials to better serve students from diverse groups. Using Code.org's online Computer Science Discoveries (CSD) curriculum, we analyzed 139,097 responses from 19,617 students to identify DIF by gender and race in assessment items (e.g. multiple choice questions). Of the 17 items, we identified six that disadvantaged students who reported as female when compared to students who reported as non-binary or male. We also identified that most (13) items disadvantaged AHNP (African/Black, Hispanic/Latinx, Native American/Alaskan Native, Pacific Islander) students compared to WA (white, Asian) students. We then conducted a workshop and interviews with seven curriculum designers and found that they interpreted item bias relative to an intersection of item features and student identity, the broader curriculum, and differing uses for assessments. We interpreted these findings in the broader context of using data on assessment bias to inform domain-experts' efforts to design more equitable learning experiences.

Supplementary Material

MP4 File (las21_talk.mp4)
Data can help us identify the existence and extent of disparities and bias in a learning experience, but we need further contextualization to take equitable action. This study explores how domain experts (curriculum designers) interpreted data on test question bias (Differential Item Functioning, DIF) for students of equivalent knowledge levels but different genders and races. Analyzing responses of 20,000 students to Code.org's middle school Computer Science Discoveries (CSD) assessments, we found questions that disadvantaged reported female and/or AHNP (African/Black, Hispanic/Latinx, Native American/Alaskan Native, and Pacific Islander) students. Curriculum designers considered this test question bias relative to test design and curriculum design. This work contributed the largest DIF analysis in the field of computing education as well as a new way to use DIF to inform the design of more equitable learning experiences.

References

[1]
Mary J Allen and Wendy M Yen. 2001. Introduction to Measurement Theory. Waveland Press.
[2]
Michael Benitez, Jr. 2010. Resituating culture centers within a social justice framework. In Culture Centers in Higher Education: Perspectives on Identity, Theory, and Practice, Lori D Patton (Ed.). Stylus Publishing, LLC., 119--134.
[3]
Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 1 (Jan. 1995), 289--300.
[4]
Marie Bienkowski, Mingyu Feng, and Barbara Means. 2012. Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief. Technical Report. U.S. Department of Education.
[5]
Christopher Brooks, René F Kizilcec, and Nia Dowell. 2020. Designing Inclusive Learning Environments. In Proceedings of the Seventh ACM Conference on Learning @ Scale (L@S '20). Association for Computing Machinery, New York, NY, USA, 225--228.
[6]
Timothy A Brown. 2014. Confirmatory Factor Analysis for Applied Research, Second Edition. Guilford Publications.
[7]
R Philip Chalmers. 2012. mirt: A Multidimensional Item Response Theory Package for the R Environment [version 1.31]. Journal of Statistical Software, Articles 48, 6 (2012), 1--29.
[8]
Peter Checkland and Sue Holwell. 1998. Action Research: Its Nature and Validity. Systemic Practice and Action Research 11, 1 (Feb. 1998), 9--21.
[9]
Beth A Clark, Jaimie F Veale, Marria Townsend, Hélène Frohard-Dourlent, and Elizabeth Saewyc. 2018. Non-binary youth: Access to gender-affirming primary health care. International Journal of Transgenderism 19, 2 (April 2018), 158--169.
[10]
Code.org Curriculum Team. 2019. CS Discoveries 2019--2020. https://curriculum.code.org/csd-19/. (2019). Accessed: 2021--1-17.
[11]
Lindsay L Cornelius and Leslie Rupert Herrenkohl. 2004. Power in the Classroom: How the Classroom Environment Shapes Students' Relationships With Each Other and With Concepts. Cogn. Instr. 22, 4 (Dec. 2004), 467--498.
[12]
National Research Council. 2012. Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering. The National Academies Press, Washington, DC.
[13]
Jorge N. Tendeiro, and Rob R. Meijer. 2017. Investigating the Practical Consequences of Model Misfit in Unidimensional IRT Models. Applied Psychological Measurement 41, 6 (2017), 439--455. http://dx.doi.org/10.1177/0146621617695522 28804181.
[14]
Matt J Davidson, Brett Wortzman, Min Li, and Amy J Ko. 2021. Investigating item bias in a CS1 exam with differential item functioning. In Proceedings of the ACM Technical Symposium on Computer Science Education (SIGCSE), Research Track. ACM.
[15]
R J De Ayala. 2009. The theory and practice of item response theory. Guilford Press, New York.
[16]
R D Dietz, R H Pearson, M R Semak, C W Willis, N Sanjay Rebello, Paula V Engelhardt, and Chandralekha Singh. 2012. Gender bias in the force concept inventory? AIP.
[17]
Neil J. Dorans. 2017. Contributions to the Quantitative Assessment of Item, Test, and Score Fairness. Springer International Publishing, Cham, 201--230. http://dx.doi.org/10.1007/978--3--319--58689--2_7
[18]
Neil J Dorans and Edward Kulick. 1986. Demonstrating the Utility of the Standardization Approach to Assessing Unexpected Differential Item Performance on the Scholastic Aptitude Test. Journal of Educational Measurement 23, 4 (1986), 355--368.
[19]
Remy Dou, Karina Bhutta, Monique Ross, Laird Kramer, and Vishodana Thamotharan. 2020. The Effects of Computer Science Stereotypes and Interest on Middle School Boys' Career Intentions. ACM Transactions on Computing Education 20, 3 (June 2020), 1--15.
[20]
Ruth Dunn. 2021. Minority Studies. LibreTexts.
[21]
Barbara Ericson and Mark Guzdial. 2014. Measuring demographics and performance in computer science education at a nationwide scale using AP CS data. In Proceedings of the 45th ACM technical symposium on Computer science education (SIGCSE '14). Association for Computing Machinery, New York, NY, USA, 217--222.
[22]
Ronald F Ferguson. 2007. Toward Excellence with Equity: An Emerging Vision for Closing the Achievement Gap. Harvard Education Press.
[23]
Rob Filback and Alan Green. 2013. New Directions for Diversity at USC Rossier. Futures in Urban Ed, the Magazine of the USC Rossier School of Education (Aug. 2013).
[24]
Christian Fischer, Zachary A Pardos, Ryan Shaun Baker, Joseph Jay Williams, Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer. 2020. Mining Big Data in Education: Affordances and Challenges. Review of Research in Education 44, 1 (March 2020), 130--160.
[25]
Gail E FitzSimons. 2011. A Framework for Evaluating Quality and Equity in Post-Compulsory Mathematics Education. In Mapping Equity and Quality in Mathematics Education, Bill Atweh, Mellony Graven, Walter Secada, and Paola Valero (Eds.). Springer Netherlands, Dordrecht, 105--121.
[26]
Julie Flapan, Jean J Ryoo, and Roxana Hadad. 2020. Building Systemic Capacity to Scale and Sustain Equity in Computer Science through Multi-Stakeholder Professional Development. In Research on Equity and Sustained Participation in Engineering, Computing, and Technology (RESPECT). IEEE.
[27]
Floyd J Fowler, Jr. and Thomas W Mangione. 1990. Standardized Survey Interviewing: Minimizing Interviewer-Related Error. SAGE.
[28]
Pamela Grimm. 2010. Social Desirability Bias. In Wiley International Encyclopedia of Marketing, Jagdish Sheth and Naresh Malhotra (Eds.). Vol. 50. John Wiley & Sons, Ltd, Chichester, UK, 537.
[29]
Anna Lauren Hoffmann. 2020. Terms of inclusion: Data, discourse, violence. New Media & Society (Sept. 2020), 1461444820958725.
[30]
Paul W Holland, Howard Wainer, and Educational Testing Service. 1993. Differential Item Functioning. Psychology Press.
[31]
Jeffrey D Holmes. 2020. The Bad Test-Taker Identity. Teach. Psychol. (Dec. 2020), 0098628320979884.
[32]
M Horvath, A M Ryan, and S L Stierwalt. 2000. The Influence of Explanations for Selection Test Use, Outcome Favorability, and Self-Efficacy on Test-Taker Perceptions. Organ. Behav. Hum. Decis. Process. 83, 2 (Nov. 2000), 310--330.
[33]
Aleata Hubbard Cheuoua. 2021. Confronting Inequities in Computer Science Education: A Case for Critical Theory. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (SIGCSE '21). Association for Computing Machinery, New York, NY, USA, 425--430.
[34]
Ben Jee, Jennifer Wiley, and Thomas Griffin. 2006. Expertise and the illusion of comprehension. In Proceedings of the Annual Conference of the Cognitive Science Society. 387--392.
[35]
Michael Kane. 2010. Validity and fairness. Language Testing 27, 2 (April 2010), 177--182.
[36]
Stephen Kemmis. 2006. Participatory action research and the public sphere. Educational Action Research 14, 4 (Dec. 2006), 459--476.
[37]
René F Kizilcec and Andrew J Saltarelli. 2019. Can a diversity statement increase diversity in MOOCs?. In Proceedings of the Sixth (2019) ACM Conference on Learning @ Scale (L@S '19). Association for Computing Machinery, New York, NY, USA, 1--8.
[38]
Sean Kross and Philip J Guo. 2018. Students, systems, and interactions: synthesizing the first four years of learning@scale and charting the future. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale (L@S '18). Association for Computing Machinery, New York, NY, USA, 1--10.
[39]
Charles E Lance, Marcus M Butts, and Lawrence C Michels. 2006. The Sources of Four Commonly Reported Cutoff Criteria: What Did They Really Say? Organizational Research Methods 9, 2 (2006), 202--220.
[40]
Julie Libarkin. 2008. Concept Inventories in Higher Education Science. In National Research Council Promising Practices in Undergraduate STEM Education Workshop, Vol. 13. 14.
[41]
Martin N Marger. 2015. Race and Ethnic Relations: American and Global Perspectives, 10th Edition. Cengage.
[42]
Patr'icia Martinková, Adéla Drabinová, Yuan-Ling Liaw, Elizabeth A Sanders, Jenny L McFarland, and Rebecca M Price. 2017. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments. Cell Biol. Educ. 16, 2 (2017), rm2.
[43]
Adam W Meade. 2010. A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology 95, 4 (2010), 728.
[44]
Daniel C Moos and Roger Azevedo. 2008. Self-regulated learning with hypermedia: The role of prior domain knowledge. Contemp. Educ. Psychol. 33, 2 (April 2008), 270--298.
[45]
Christof Nachtigall, Ulf Kröhne, Ulrike Enders, and Rolf Steyer. 2008. Causal effects and fair comparison: Considering the influence of context variables on student competencies. In Assessment of Competencies in Educational Contexts, Johannes Hartig, Eckhard Klieme, and Detlev Leutner (Eds.). Hogrefe Publishing, 315--336.
[46]
Kevin L Nadal. 2017. The SAGE Encyclopedia of Psychology and Gender. SAGE Publications.
[47]
National Academies of Sciences, Engineering, and Medicine. 2018. How People Learn II: Learners, Contexts, and Cultures. National Academies Press, Washington, D.C.
[48]
Jum C Nunnally. 1978. Psychometric theory (2d ed. ed.). McGraw-Hill, New York.
[49]
Harold Pashler, Patrice M Bain, Brian A Bottge, Arthur Graeser, Kenneth Koedinger, Mark McDaniel, and Janet Metcalfe. 2007. Organizing Instruction and Study to Improve Student Learning. Technical Report NCER 2007--2004. U.S. Department of Education.
[50]
Heather E Price. 2019. Large-Scale Datasets and Social Justice: Measuring Inequality in Opportunities to Learn. In Research Methods for Social Justice and Equity in Education, Kamden K Strunk and Leslie Ann Locke (Eds.). Springer International Publishing, Cham, 203--215.
[51]
William Revelle. 2020. psych: Procedures for Psychological, Psychometric, and Personality Research. https://CRAN.R-project.org/package=psych. (Dec. 2020). version 1.9.12.31.
[52]
Karen Rosenblum and Toni-Michelle Travis. 2015. The Meaning of Difference: American Constructions of Race and Ethnicity, Sex and Gender, Social Class, Sexuality, and Disability.
[53]
Monique Ross, Zahra Hazari, Gerhard Sonnert, and Philip Sadler. 2020. The Intersection of Being Black and Being a Woman: Examining the Effect of Social Computing Relationships on Computer Science Career Choice. ACM Trans. Comput. Educ. 20, 2 (Feb. 2020), 1--15.
[54]
Yves Rosseel. 2012. lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, Articles 48, 2 (2012), 1--36. version 0.6--5.
[55]
Toni Schmader, Michael Johns, and Chad Forbes. 2008. An integrated process model of stereotype threat effects on performance. Psychol. Rev. 115, 2 (April 2008), 336--356.
[56]
Niral Shah and Colleen M Lewis. 2019. Amplifying and Attenuating Inequity in Collaborative Learning: Toward an Analytical Framework. Cogn. Instr. 37, 4 (Oct. 2019), 423--452.
[57]
Suzanne L Slocum-Gori and Bruno D Zumbo. 2011. Assessing the Unidimensionality of Psychological Scales: Using Multiple Criteria from Factor Analysis. Soc. Indic. Res. 102, 3 (2011), 443--461.
[58]
Oddny Judith Solheim. 2011. The Impact of Reading Self-Efficacy and Task Value on Reading Comprehension Scores in Different Item Formats. Read. Psychol. 32, 1 (Jan. 2011), 1--27.
[59]
Claude Steele. 2011. Stereotype Threat and African-American Student Achievement. In The Inequality Reader (2 ed.). Routledge, 276--281.
[60]
Kamden K Strunk and Leslie Ann Locke (Eds.). 2019. Research Methods for Social Justice and Equity in Education. Palgrave Macmillan.
[61]
Burc cin Tamer and Jane Stout. 2016. Recruitment and Retention of Undergraduate Students in Computing: Patterns by Gender and Race/Ethnicity. Technical Report. Computing Research Association.
[62]
Suraj Uttamchandani. 2018. Equity in the learning sciences: Recent themes and pathways. In 13th International Conference of the Learning Sciences (ICLS). International Society of the Learning Sciences (ISLS).
[63]
Cindy M. Walker. 2011. What's the DIF? Why Differential Item Functioning Analyses Are an Important Part of Instrument Development and Validation. Journal of Psychoeducational Assessment 29, 4 (Aug 2011), 364--376. http://dx.doi.org/10.1177/0734282911406666
[64]
Margaret Walsh, Crystal Hickey, and Jim Duffy. 1999. Influence of Item Content and Stereotype Situation on Gender Differences in Mathematical Problem Solving. Sex Roles 41, 3--4 (Aug. 1999), 219--240.
[65]
Max Weber. 1948. From Max Weber: Essays in Sociology. Vol. 33. Routledge.
[66]
S Christian Wheeler and Richard E Petty. 2001. The effects of stereotype activation on behavior: A review of possible mechanisms. Psychol. Bull. 127, 6 (Nov. 2001), 797--826.
[67]
Tiffani L Williams. 2020. 'Underrepresented Minority' Considered Harmful, Racist Language. Commun. ACM (June 2020).
[68]
Carol M Woods. 2009. Evaluation of MIMIC-Model Methods for DIF Testing With Comparison to Two-Group Analysis. Multivariate Behav. Res. 44, 1 (Jan. 2009), 1--27.
[69]
Carol M Woods, Li Cai, and Mian Wang. 2013. The Langer-Improved Wald Test for DIF Testing With Multiple Groups: Evaluation and Comparison to Two-Group IRT. Educ. Psychol. Meas. 73, 3 (June 2013), 532--547.
[70]
Benjamin Xie. 2020. How data can support equity in computing education. XRDS: Crossroads, The ACM Magazine for Students 27, 2 (Dec. 2020), 48--52.
[71]
Benjamin Xie, Matthew J Davidson, Min Li, and Amy J Ko. 2019. An Item Response Theory Evaluation of a Language-Independent CS1 Knowledge Assessment. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE '19). ACM, 699--705.
[72]
Benjamin Xie, Greg L Nelson, Harshitha Akkaraju, William Kwok, and Amy J Ko. 2020. The Effect of Informing Agency in Self-Directed Online Learning Environments. In Proceedings of the Seventh (2020) ACM Conference on Learning @ Scale (L@S 2020). ACM. To appear.
[73]
Michael Zieky. 1993. Practical questions in the use of DIF statistics in test development. In Differential Item Functioning, Paul W Holland and Howard Wainer (Eds.). Erlbaum, 337--347.
[74]
Michael Zieky. 2003. A DIF Primer. Technical Report. Educational Testing Service.
[75]
Bruno D Zumbo. 2007. Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Lang. Assess. Q. 4, 2 (2007), 223--233.
[76]
Bruno D Zumbo and Michaela N Gelin. 2005. A Matter of Test Bias in Educational Policy Research: Bringing the Context into Picture by Investigating Sociological/Community Moderated (or Mediated) Test and Item Bias. Journal of Educational Research & Policy Studies 5, 1 (2005), 1--23.
[77]
Stuart Zweben and Betsy Bizot. 2019. Taulbee Survey. Technical Report. Computing Research Association.

Cited By

View all
  • (2024)More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons ProblemsProceedings of the 26th Australasian Computing Education Conference10.1145/3636243.3636247(29-38)Online publication date: 29-Jan-2024
  • (2024)Exploring the Impact of Assessment Policies on Marginalized Students' Experiences in Post-Secondary Programming CoursesProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671100(233-245)Online publication date: 12-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
L@S '21: Proceedings of the Eighth ACM Conference on Learning @ Scale
June 2021
380 pages
ISBN:9781450382151
DOI:10.1145/3430895
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. assessment interpretation and use
  2. computing education
  3. differential item functioning
  4. item response theory
  5. test bias
  6. validity

Qualifiers

  • Research-article

Funding Sources

Conference

L@S '21
L@S '21: Eighth (2021) ACM Conference on Learning @ Scale
June 22 - 25, 2021
Virtual Event, Germany

Acceptance Rates

Overall Acceptance Rate 117 of 440 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)16
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons ProblemsProceedings of the 26th Australasian Computing Education Conference10.1145/3636243.3636247(29-38)Online publication date: 29-Jan-2024
  • (2024)Exploring the Impact of Assessment Policies on Marginalized Students' Experiences in Post-Secondary Programming CoursesProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671100(233-245)Online publication date: 12-Aug-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media