research-article

Investigating Item Bias in a CS1 Exam with Differential Item Functioning

Authors:

Matt J. Davidson,

Brett Wortzman,

Min LiAuthors Info & Claims

SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education

Pages 1142 - 1148

https://doi.org/10.1145/3408877.3432397

Published: 05 March 2021 Publication History

Abstract

Reliable and valid exams are a crucial part of both sound research design and trustworthy assessment of student knowledge. Assessing and addressing item bias is a crucial step in building a validity argument for any assessment instrument. Despite calls for valid assessment tools in CS, item bias is rarely investigated. What kinds of item bias might appear in conventional CS1 exams? To investigate this, we examined responses to a final exam in a large CS1 course. We used differential item functioning (DIF) methods and specifically investigated bias related to binary gender and year of study. Although not a published assessment instrument, the exam had a similar format to many exams in higher education and research: students are asked to trace code and write programs, using paper and pencil. One item with significant DIF was detected on the exam, though the magnitude was negligible. This case study shows how to detect DIF items so that future researchers and practitioners can do these analyses.

References

[1]

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. 2014. Standards for educational and psychological testing. American Educational Research Association.

[2]

Deborah L. Bandalos. 2018. Measurement theory and applications for the social sciences. Guilford Press.

[3]

William C. M. Belzak. 2019. Testing Differential Item Functioning in Small Samples. Multivariate Behavioral Research (Oct 2019), 1--26. https://doi.org/10.1080/00273171.2019.1671162

[4]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), Vol. 57, 1 (1995), 289--300.

[5]

Ryan Bockmon, Stephen Cooper, Jonathan Gratch, and Mohsen Dorodchi. 2019. (Re)Validating Cognitive Introductory Computing Instruments. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education - SIGCSE '19. ACM Press, 552--557. https://doi.org/10.1145/3287324.3287372

Digital Library

[6]

R. Philip Chalmers. 2012. mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, Vol. 48, 6 (2012), 1--29. https://doi.org/10.18637/jss.v048.i06

[7]

Nick Cheng and Brian Harrington. 2017. The Code Mangler: Evaluating Coding Ability Without Writing any Code. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education - SIGCSE '17. ACM Press, 123--128. https://doi.org/10.1145/3017680.3017704

Digital Library

[8]

Howard Wainer David Thissen, Lynne Steinberg. 1988. Use of item response theory in the study of group differences in trace lines. Test validity (1988), 147.

[9]

R. J. De Ayala. 2009. The theory and practice of item response theory .Guilford Press.

[10]

Adrienne Decker. 2007. How Students Measure Up: an Assessment Instrument for Introductory Computer Science. Ph.D. Dissertation. State University of New York.

[11]

Neil J Dorans and Edward Kulick. 1986. Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of educational measurement, Vol. 23, 4 (1986), 355--368.

[12]

Barbara Ericson, Shelly Engelman, Tom McKlin, and Ja'Quan Taylor. 2014. Project Rise up 4 CS: Increasing the Number of Black Students Who Pass Advanced Placement CS A. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (Atlanta, Georgia, USA) (SIGCSE '14). Association for Computing Machinery, New York, NY, USA, 439--444. https://doi.org/10.1145/2538862.2538937

Digital Library

[13]

Barbara J. Ericson, Miranda C. Parker, and Shelly Engelman. 2016. Sisters Rise Up 4 CS: Helping Female Students Pass the Advanced Placement Computer Science A Exam. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, Tennessee, USA) (SIGCSE '16). Association for Computing Machinery, New York, NY, USA, 309--314. https://doi.org/10.1145/2839509.2844623

Digital Library

[14]

W. Holmes Finch. 2016. Detection of Differential Item Functioning for More Than Two Groups: A Monte Carlo Comparison of Methods. Applied Measurement in Education, Vol. 29, 1 (Jan 2016), 30--45. https://doi.org/10.1080/08957347.2015.1102916

[15]

Google and Gallup. 2015. Images of Computer Science: Perceptions Among Students, Parents, and Educators in the U.S. (2015).

[16]

Geoffrey L. Herman, Craig Zilles, and Michael C. Loui. 2014. A psychometric evaluation of the digital logic concept inventory. Computer Science Education, Vol. 24, 4 (Oct 2014), 277--303. https://doi.org/10.1080/08993408.2014.970781

[17]

Michael G Jodoin and Mark J Gierl. 2001. Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied measurement in education, Vol. 14, 4 (2001), 329--349.

[18]

Lisa C. Kaczmarczyk, Elizabeth R. Petrick, J. Philip East, and Geoffrey L. Herman. 2010. Identifying student misconceptions of programming. In Proceedings of the 41st ACM technical symposium on Computer science education - SIGCSE '10. ACM Press, 107. https://doi.org/10.1145/1734263.1734299

[19]

Michael T. Kane. 2013. Validating the Interpretations and Uses of Test Scores: Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, Vol. 50, 1 (Mar 2013), 1--73. https://doi.org/10.1111/jedm.12000

[20]

Rex B Kline. 2015. Principles and practice of structural equation modeling .Guilford publications.

[21]

Sunbok Lee. 2017. Detecting Differential Item Functioning Using the Logistic Regression Procedure in Small Samples. Applied Psychological Measurement, Vol. 41, 1 (Jan 2017), 30--43. https://doi.org/10.1177/0146621616668015

[22]

D. Magis, S. Beland, F. Tuerlinckx, and P. De Boeck. 2010. A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, Vol. 42 (2010), 847--862.

[23]

Patrícia Martinková, Adéla Drabinová, Yuan-Ling Liaw, Elizabeth A. Sanders, Jenny L. McFarland, and Rebecca M. Price. 2017. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments. CBE?Life Sciences Education, Vol. 16, 2 (Jun 2017), rm2. https://doi.org/10.1187/cbe.16--10-0307

[24]

Adam W. Meade. 2010. A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, Vol. 95, 4 (Jul 2010), 728--743. https://doi.org/10.1037/a0018966

[25]

Samuel Messick. 1995. Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, Vol. 50, 9 (1995), 741.

[26]

Greg L. Nelson, Andrew Hu, Benjamin Xie, and Amy J. Ko. 2019. Towards validity for a formative assessment for language-specific program tracing skills. In Proceedings of the 19th Koli Calling International Conference on Computing Education Research. ACM, 1--10. https://doi.org/10.1145/3364510.3364525

[27]

Thomas H. Park, Meen Chul Kim, Sukrit Chhabra, Brian Lee, and Andrea Forte. 2016. Reading Hierarchies in Code: Assessment of a Basic Computational Skill. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education - ITiCSE '16. ACM Press, 302--307. https://doi.org/10.1145/2899415.2899435

Digital Library

[28]

Miranda C. Parker, Mark Guzdial, and Shelly Engleman. 2016. Replication, Validation, and Use of a Language Independent CS1 Knowledge Assessment. In Proceedings of the 2016 ACM Conference on International Computing Education Research - ICER '16. ACM Press, 93--101. https://doi.org/10.1145/2960310.2960316

Digital Library

[29]

R Core Team. 2020. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

[30]

William Revelle and Richard E. Zinbarg. 2009. Coefficients Alpha, Beta, Omega, and the glb: Comments on Sijtsma. Psychometrika, Vol. 74, 1 (Mar 2009), 145--154. https://doi.org/10.1007/s11336-008--9102-z

[31]

Yves Rosseel. 2012. lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, Vol. 48, 2 (2012), 1--36. http://www.jstatsoft.org/v48/i02/

[32]

Klaas Sijtsma. 2009. On the Use, the Misuse, and the Very Limited Usefulness of Cronbach?s Alpha. Psychometrika, Vol. 74, 1 (Mar 2009), 107--120. https://doi.org/10.1007/s11336-008--9101-0

[33]

Suzanne L. Slocum-Gori and Bruno D. Zumbo. 2011. Assessing the Unidimensionality of Psychological Scales: Using Multiple Criteria from Factor Analysis. Social Indicators Research, Vol. 102, 3 (Jul 2011), 443--461. https://doi.org/10.1007/s11205-010--9682--8

[34]

Phil Steinhorst, Andrew Petersen, and Jan Vahrenhold. 2020. Revisiting Self-Efficacy in Introductory Programming. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 158--169.

Digital Library

[35]

Dubravka Svetina and Leslie Rutkowski. 2014. Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments. Large-scale Assessments in Education, Vol. 2, 1 (Dec 2014), 4. https://doi.org/10.1186/s40536-014-0004--5

[36]

Keith S. Taber. 2018. The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education, Vol. 48, 6 (Dec 2018), 1273--1296. https://doi.org/10.1007/s11165-016--9602--2

[37]

A. E. Tew and B. Dorn. 2013. The Case for Validated Tools in Computer Science Education Research. Computer, Vol. 46, 9 (2013), 60--66.

Digital Library

[38]

Allison Elliott Tew and Mark Guzdial. 2011. The FCS1: a language independent assessment of CS1 knowledge. In Proceedings of the 42nd ACM technical symposium on Computer science education - SIGCSE '11. ACM Press, 111. https://doi.org/10.1145/1953163.1953200

Digital Library

[39]

Italo Trizano-Hermosilla and Jesús M. Alvarado. 2016. Best Alternatives to Cronbach's Alpha Reliability in Realistic Conditions: Congeneric and Asymmetrical Measurements. Frontiers in Psychology, Vol. 7 (May 2016). https://doi.org/10.3389/fpsyg.2016.00769

[40]

Cindy M. Walker. 2011. What's the DIF? Why Differential Item Functioning Analyses Are an Important Part of Instrument Development and Validation. Journal of Psychoeducational Assessment, Vol. 29, 4 (Aug 2011), 364--376. https://doi.org/10.1177/0734282911406666

[41]

Benjamin Xie, Matthew J. Davidson, Min Li, and Andrew J. Ko. 2019. An Item Response Theory Evaluation of a Language-Independent CS1 Knowledge Assessment. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education - SIGCSE '19. ACM Press, 699--705. https://doi.org/10.1145/3287324.3287370

[42]

Bruno D Zumbo. 1999. A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. 57 pages.

[43]

Bruno D Zumbo. 2007. Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language assessment quarterly, Vol. 4, 2 (2007), 223--233.

[44]

Rebecca Zwick. 2012. A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, Vol. 2012, 1 (2012), i--30.

Cited By

Index Terms

Investigating Item Bias in a CS1 Exam with Differential Item Functioning
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Student assessment

Recommendations

Domain Experts' Interpretations of Assessment Bias in a Scaled, Online Computer Science Curriculum
L@S '21: Proceedings of the Eighth ACM Conference on Learning @ Scale

Understanding inequity at scale is necessary for designing equitable online learning experiences, but also difficult. Statistical techniques like differential item functioning (DIF) can help identify whether items/questions in an assessment exhibit ...
Replicating a Validated CS1 Assessment (Abstract Only)
SIGCSE '16: Proceedings of the 47th ACM Technical Symposium on Computing Science Education

Validated assessments are important for teachers and researchers. A validated assessment is carefully developed to make sure that it is measuring the right things. Computing education needs more and better validated assessments. Validated assessments ...
Intersectional Biases Within an Introductory Computing Assessment
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Assessments that can measure student understanding of concepts in a reliable and valid way are incredibly valuable in research. Unfortunately, assessments can be a source of bias, differentially impacting students along various demographic lines. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education

March 2021

1454 pages

ISBN:9781450380621

DOI:10.1145/3408877

General Chairs:
Mark Sherriff
University of Virginia, USA
,
Laurence D. Merkle
Air Force Institute of Technology, USA
,
Program Chairs:
Pamela Cutter
Kalamazoo College, USA
,
Alvaro Monge
California State University, Long Beach, USA
,
Judithe Sheard
Monash University, Australia

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGCSE '21

Sponsor:

SIGCSE

SIGCSE '21: The 52nd ACM Technical Symposium on Computer Science Education

March 13 - 20, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,787 of 5,146 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
217
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten