research-article

Public Access

Are We Fair?: Quantifying Score Impacts of Computer Science Exams with Randomized Question Pools

Authors:

David H. Smith, IV,

Craig ZillesAuthors Info & Claims

SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1

Pages 647 - 653

https://doi.org/10.1145/3478431.3499388

Published: 22 February 2022 Publication History

Abstract

With the increase of large enrollment courses and the growing need to offer online instruction, computer-based exams randomly generated from question pools have a clear benefit for computing courses. Such exams can be used at scale, scheduled asynchronously and/or online, and use versioning to make attempts at cheating less profitable. Despite these benefits, we want to ensure that the technique is not unfair to students, particularly when it comes to equivalent difficulty across exam versions.

To investigate generated exam fairness, we use a Generalized Partial Credit Model (GPCM) Item-Response Theory (IRT) model to fit exams from a for-majors data structures course and non-majors CS0 course, both of which used randomly generated exams. For all exams, students' estimated ability and exam score are strongly correlated (ρ ≥ 0.7), suggesting that the exams are reasonably fair. Through simulation, we find that most of the variance in any given student's simulated scores is due to chance and the worst of the score impacts from possibly unfair permutations is only around 5 percentage points on an exam. We discuss implications of this work and possible future steps.

References

[1]

Kirsti M Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer Science Education 15, 2 (2005), 83--102. https://doi.org/10.1080/08993400500150747 arXiv:https://doi.org/10.1080/08993400500150747

[2]

Ashraf Amria., Ahmed Ewais., and Rami Hodrob. 2018. A Framework for Automatic Exam Generation based on Intended Learning Outcomes. In Proceedings of the 10th International Conference on Computer Supported Education - Volume 2: CSEDU,. INSTICC, SciTePress, 474--480. https://doi.org/10.5220/ 0006795104740480

[3]

Yigal Attali. 2018. Automatic Item Generation Unleashed: An Evaluation of a Large-Scale Deployment of Item Models. In Artificial Intelligence in Education, Carolyn Penstein Rosé, Roberto Martínez-Maldonado, H. Ulrich Hoppe, Rose Luckin, Manolis Mavrikis, Kaska Porayska-Pomsta, Bruce McLaren, and Benedict du Boulay (Eds.). Springer International Publishing, Cham, 17--29.

[4]

E. G. Bailey, J. Jensen, J. Nelson, H. K. Wiberg, and J. D. Bell. 2017. Weekly Formative Exams and Creative Grading Enhance Student Learning in an Introductory Biology Course. CBE-Life Sciences Education 16, 1 (March 2017), ar2. https://doi.org/10.1187/cbe.16-02-0104

[5]

Betsy Bizot and Stu Zweben. 2019. Generation CS, Three Years Later. Technical Report. Computing Research Association. https://cra.org/generation-cs-threeyears-later/

[6]

Tracy Camp, W. Richards Adrion, Betsy Bizot, Susan Davidson, Mary Hall, Susanne Hambrusch, Ellen Walker, and Stuart Zweben. 2017. Generation CS: The Mixed News on Diversity and the Enrollment Surge. ACM Inroads 8, 3 (July 2017), 36--42. https://doi.org/10.1145/3103175

Digital Library

[7]

Binglin Chen, Matthew West, and Craig Zilles. 2018. How Much Randomization is Needed to Deter Collaborative Cheating on Asynchronous Exams?. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale (London, United Kingdom) (L@S '18). Association for Computing Machinery, New York, NY, USA, Article 62, 10 pages. https://doi.org/10.1145/3231644.3231664

Digital Library

[8]

Binglin Chen, Craig Zilles, Matthew West, and Timothy Bretl. 2019. Effect of Discrete and Continuous Parameter Variation on Difficulty in Automatic Item Generation. In Artificial Intelligence in Education, Seiji Isotani, Eva Millán, Amy Ogan, Peter Hastings, Bruce McLaren, and Rose Luckin (Eds.). Springer International Publishing, Cham, 71--83.

[9]

Benjamin Clegg, Siobhán North, Phil McMinn, and Gordon Fraser. 2019. Simulating Student Mistakes to Evaluate the Fairness of Automated Grading. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training (Montreal, Quebec, Canada) (ICSE-SEET '19). IEEE Press, 121--125. https://doi.org/10.1109/ICSE-SEET.2019.00021

Digital Library

[10]

Computing Research Association. 2017. Generation CS: Computer Science Undergraduate Enrollments Surge Since 2006. Technical Report. https://cra.org/data/ generation-cs/

[11]

Matt J. Davidson, Brett Wortzman, Amy J. Ko, and Min Li. 2021. Investigating Item Bias in a CS1 Exam with Differential Item Functioning. Association for Computing Machinery, New York, NY, USA, 1142--1148. https://doi.org/10.1145/3408877. 3432397

Digital Library

[12]

Debora de Chiusole, Luca Stefanutti, Pasquale Anselmi, and Egidio Robusto. 2018. Testing the actual equivalence of automatically generated items. Behavior Research Methods 50, 1 (Feb. 2018), 39--56. https://doi.org/10.3758/s13428-017- 1004--5

[13]

Stephen H. Edwards and Manuel A. Perez-Quinones. 2008. Web-CAT: Automatically Grading Programming Assignments. In Proceedings of the 13th Annual Conference on Innovation and Technology in Computer Science Education (Madrid, Spain) (ITiCSE '08). Association for Computing Machinery, New York, NY, USA, 328. https://doi.org/10.1145/1384271.1384371

Digital Library

[14]

Chinedu Emeka and Craig Zilles. 2020. Student Perceptions of Fairness and Security in a Versioned Programming Exam. In Proceedings of the 2020 ACM Conference on International Computing Education Research (Virtual Event, New Zealand) (ICER '20). Association for Computing Machinery, New York, NY, USA, 25--35. https://doi.org/10.1145/3372782.3406275

Digital Library

[15]

Lena Feinman. 2018. Alternative to Proctoring in Introductory Statistics Community College Courses. Walden Dissertations and Doctoral Studies (Jan. 2018). https://scholarworks.waldenu.edu/dissertations/4622

[16]

Oscar E. Fernandez. 2021. Second Chance Grading: An Equitable, Meaningful, and Easy-to-Implement Grading System that Synergizes the Research on Testing for Learning, Mastery Grading, and Growth Mindsets. PRIMUS 31, 8 (2021), 855--868. https://doi.org/10.1080/10511970.2020.1772915 arXiv:https://doi.org/10.1080/10511970.2020.1772915

[17]

Daniel T. Fokum, Daniel N. Coore, Eyton Ferguson, Gunjan Mansingh, and Carl Beckford. 2019. Student Performance in Computing Courses in the Face of Growing Enrollments. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE '19). Association for Computing Machinery, New York, NY, USA, 43--48. https://doi.org/10.1145/ 3287324.3287354

Digital Library

[18]

Max Fowler and Craig Zilles. 2021. Superficial Code-Guise: Investigating the Impact of Surface Feature Changes on Students' Programming Question Scores. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (Virtual Event, USA) (SIGCSE '21). Association for Computing Machinery, New York, NY, USA, 3--9. https://doi.org/10.1145/3408877.3432413

Digital Library

[19]

M.J Gierl and T.M Haladyna (Eds.). 2012. Automatic Item Generation: Theory and Practice (1 ed.). Routledge. https://doi.org/10.4324/9780203803912

[20]

Gwo-Jen Hwang, Hui-Chun Chu, Peng-Yeng Yin, and Ji-Yu Lin. 2008. An innovative parallel test sheet composition approach to meet multiple assessment criteria for national tests. Computers & Education 51, 3 (2008), 1058--1072. https://doi.org/10.1016/j.compedu.2007.10.006

Digital Library

[21]

Jason W. Morphew, Mariana Silva, Geoffrey Herman, and Matthew West. 2020. Frequent mastery testing with second-chance exams leads to enhanced student learning in undergraduate engineering. Applied Cognitive Psychology 34, 1 (2020), 168--181. https://doi.org/10.1002/acp.3605

[22]

Neil P. Morris, Mariya Ivancheva, Taryn Coop, Rada Mogliacci, and Bronwen Swinnerton. 2020. Negotiating growth of online education in higher education. International Journal of Educational Technology in Higher Education 17, 1 (Nov. 2020), 48. https://doi.org/10.1186/s41239-020-00227-w

[23]

George Nakos and Anita Whiting. 2018. The role of frequent short exams in improving student performance in hybrid global business classes. Journal of Education for Business 93, 2 (2018), 51--57. https://doi.org/10.1080/08832323.2017. 1417231 arXiv:https://doi.org/10.1080/08832323.2017.1417231

[24]

Shailendra Palvia, Prageet Aeron, Parul Gupta, Diptiranjan Mahapatra, Ratri Parida, Rebecca Rosner, and Sumita Sindhi. 2018. Online Education: Worldwide Status, Challenges, Trends, and Implications. Journal of Global Information Technology Management 21, 4 (2018), 233--241. https://doi.org/10.1080/1097198X. 2018.1542262 arXiv:https://doi.org/10.1080/1097198X.2018.1542262

[25]

Matthew West Paras Sud and Craig Zilles. 2019. Reducing Difficulty Variance in Randomized Assessments. In 2019 ASEE Annual Conference & Exposition. ASEE Conferences, Tampa, Florida. https://doi.org/10.18260/1--2--33228

[26]

Gili Rusak and Lisa Yan. 2021. Unique Exams: Designing Assessments for Integrity and Fairness. Association for Computing Machinery, New York, NY, USA, 1170--1176. https://doi.org/10.1145/3408877.3432556

Digital Library

[27]

Mehran Sahami and Chris Piech. 2016. As CS Enrollments Grow, Are We Attracting Weaker Students?. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, Tennessee, USA) (SIGCSE '16). Association for Computing Machinery, New York, NY, USA, 54--59. https: //doi.org/10.1145/2839509.2844621

Digital Library

[28]

Mohammad A Sarrayrih and Mohammed Ilyas. 2013. Challenges of online exam, performances and problems for online university exam. International Journal of Computer Science Issues (IJCSI) 10, 1 (2013), 439.

[29]

Michael Scott, Tim Stelzer, and Gary Gladding. 2006. Evaluating multiple-choice exams in large introductory physics courses. Physical Review Special Topics - Physics Education Research 2, 2 (July 2006), 020102. https://doi.org/10.1103/ PhysRevSTPER.2.020102

[30]

Mariana Silva, Matthew West, and Craig Zilles. 2020. Measuring the Score Advantage on Asynchronous Exams in an Undergraduate CS Course. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (Portland, OR, USA) (SIGCSE '20). Association for Computing Machinery, New York, NY, USA, 873--879. https://doi.org/10.1145/3328778.3366859

Digital Library

[31]

Michael P. Watters, Paul J. Robertson, and Renae K. Clark. 2011. Student Perceptions of Cheating in Online Business Courses. Journal of Instructional Pedagogies 6 (Sept. 2011). https://eric.ed.gov/?id=EJ1097041

[32]

Matthew West, Geoffrey L. Herman, and Craig Zilles. 2015. PrairieLearn: Mastery-based Online Problem Solving with Adaptive Scoring and Recommendations Driven by Machine Learning. 26.1238.1--26.1238.14. https://peer.asee.org/prairielearn-mastery-based-online-problem-solvingwith-adaptive-scoring-and-recommendations-driven-by-machine-learning

Cited By

Hickman HMcKeown PBell T(2023)Beyond Question Shuffling: Randomization Techniques in Programming Assessment2023 IEEE Frontiers in Education Conference (FIE)10.1109/FIE58773.2023.10342976(1-9)Online publication date: 18-Oct-2023
https://doi.org/10.1109/FIE58773.2023.10342976
Sindre GHaugset B(2022)Techniques for detecting and deterring cheating in home exams in programming2022 IEEE Frontiers in Education Conference (FIE)10.1109/FIE56618.2022.9962547(1-8)Online publication date: 8-Oct-2022
https://doi.org/10.1109/FIE56618.2022.9962547

Index Terms

Are We Fair?: Quantifying Score Impacts of Computer Science Exams with Randomized Question Pools
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Student assessment

Recommendations

Lessons Learned from Asynchronous Online Assessment Formats in CS0 and CS3
SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1

This paper provides an experience report for the asynchronous online assessments in two classes in Fall 2020: CS0 and CS3. The two courses shared many structural similarities; both were taught by the same instructor, with three exams delivered ...
Student Perceptions of Fairness and Security in a Versioned Programming Exam
ICER '20: Proceedings of the 2020 ACM Conference on International Computing Education Research

Using multiple versions of exams is a common exam security technique to prevent cheating in a variety of contexts. While psycho-metric techniques are routinely used by large high-stakes testing companies to ensure equivalence between exam versions, such ...
Free Ebooks for Computer Science Courses: Now With Support for Peer Instruction, Choice Questions, and Exam Generation
SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 2

This workshop will introduce computer science teachers (both secondary and post-secondary) to Runestone ebooks and highlight some exciting new features including support for Peer Instruction, choice questions (choose which questions to answer from ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1

February 2022

1049 pages

ISBN:9781450390705

DOI:10.1145/3478431

General Chairs:
Larry Merkle
Air Force Institute of Technology, USA
,
Maureen Doyle
Northern Kentucky University, USA
,
Program Chairs:
Judithe Sheard
Monash University, Australia
,
Leen-Kiat Soh
University of Nebraska-Lincoln, USA
,
Brian Dorn
University of Nebraska at Omaha, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGCSE 2022

Sponsor:

SIGCSE

SIGCSE 2022: The 53rd ACM Technical Symposium on Computer Science Education

March 3 - 5, 2022

RI, Providence, USA

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE TS 2025

Sponsor:
sigcse

The 56th ACM Technical Symposium on Computer Science Education

February 26 - March 1, 2025

Pittsburgh , PA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
339
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)19

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hickman HMcKeown PBell T(2023)Beyond Question Shuffling: Randomization Techniques in Programming Assessment2023 IEEE Frontiers in Education Conference (FIE)10.1109/FIE58773.2023.10342976(1-9)Online publication date: 18-Oct-2023
https://doi.org/10.1109/FIE58773.2023.10342976
Sindre GHaugset B(2022)Techniques for detecting and deterring cheating in home exams in programming2022 IEEE Frontiers in Education Conference (FIE)10.1109/FIE56618.2022.9962547(1-8)Online publication date: 8-Oct-2022
https://doi.org/10.1109/FIE56618.2022.9962547

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents