skip to main content
10.1145/3478431.3499388acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article
Public Access

Are We Fair?: Quantifying Score Impacts of Computer Science Exams with Randomized Question Pools

Published: 22 February 2022 Publication History

Abstract

With the increase of large enrollment courses and the growing need to offer online instruction, computer-based exams randomly generated from question pools have a clear benefit for computing courses. Such exams can be used at scale, scheduled asynchronously and/or online, and use versioning to make attempts at cheating less profitable. Despite these benefits, we want to ensure that the technique is not unfair to students, particularly when it comes to equivalent difficulty across exam versions.
To investigate generated exam fairness, we use a Generalized Partial Credit Model (GPCM) Item-Response Theory (IRT) model to fit exams from a for-majors data structures course and non-majors CS0 course, both of which used randomly generated exams. For all exams, students' estimated ability and exam score are strongly correlated (ρ ≥ 0.7), suggesting that the exams are reasonably fair. Through simulation, we find that most of the variance in any given student's simulated scores is due to chance and the worst of the score impacts from possibly unfair permutations is only around 5 percentage points on an exam. We discuss implications of this work and possible future steps.

References

[1]
Kirsti M Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer Science Education 15, 2 (2005), 83--102. https://doi.org/10.1080/08993400500150747 arXiv:https://doi.org/10.1080/08993400500150747
[2]
Ashraf Amria., Ahmed Ewais., and Rami Hodrob. 2018. A Framework for Automatic Exam Generation based on Intended Learning Outcomes. In Proceedings of the 10th International Conference on Computer Supported Education - Volume 2: CSEDU,. INSTICC, SciTePress, 474--480. https://doi.org/10.5220/ 0006795104740480
[3]
Yigal Attali. 2018. Automatic Item Generation Unleashed: An Evaluation of a Large-Scale Deployment of Item Models. In Artificial Intelligence in Education, Carolyn Penstein Rosé, Roberto Martínez-Maldonado, H. Ulrich Hoppe, Rose Luckin, Manolis Mavrikis, Kaska Porayska-Pomsta, Bruce McLaren, and Benedict du Boulay (Eds.). Springer International Publishing, Cham, 17--29.
[4]
E. G. Bailey, J. Jensen, J. Nelson, H. K. Wiberg, and J. D. Bell. 2017. Weekly Formative Exams and Creative Grading Enhance Student Learning in an Introductory Biology Course. CBE-Life Sciences Education 16, 1 (March 2017), ar2. https://doi.org/10.1187/cbe.16-02-0104
[5]
Betsy Bizot and Stu Zweben. 2019. Generation CS, Three Years Later. Technical Report. Computing Research Association. https://cra.org/generation-cs-threeyears-later/
[6]
Tracy Camp, W. Richards Adrion, Betsy Bizot, Susan Davidson, Mary Hall, Susanne Hambrusch, Ellen Walker, and Stuart Zweben. 2017. Generation CS: The Mixed News on Diversity and the Enrollment Surge. ACM Inroads 8, 3 (July 2017), 36--42. https://doi.org/10.1145/3103175
[7]
Binglin Chen, Matthew West, and Craig Zilles. 2018. How Much Randomization is Needed to Deter Collaborative Cheating on Asynchronous Exams?. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale (London, United Kingdom) (L@S '18). Association for Computing Machinery, New York, NY, USA, Article 62, 10 pages. https://doi.org/10.1145/3231644.3231664
[8]
Binglin Chen, Craig Zilles, Matthew West, and Timothy Bretl. 2019. Effect of Discrete and Continuous Parameter Variation on Difficulty in Automatic Item Generation. In Artificial Intelligence in Education, Seiji Isotani, Eva Millán, Amy Ogan, Peter Hastings, Bruce McLaren, and Rose Luckin (Eds.). Springer International Publishing, Cham, 71--83.
[9]
Benjamin Clegg, Siobhán North, Phil McMinn, and Gordon Fraser. 2019. Simulating Student Mistakes to Evaluate the Fairness of Automated Grading. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training (Montreal, Quebec, Canada) (ICSE-SEET '19). IEEE Press, 121--125. https://doi.org/10.1109/ICSE-SEET.2019.00021
[10]
Computing Research Association. 2017. Generation CS: Computer Science Undergraduate Enrollments Surge Since 2006. Technical Report. https://cra.org/data/ generation-cs/
[11]
Matt J. Davidson, Brett Wortzman, Amy J. Ko, and Min Li. 2021. Investigating Item Bias in a CS1 Exam with Differential Item Functioning. Association for Computing Machinery, New York, NY, USA, 1142--1148. https://doi.org/10.1145/3408877. 3432397
[12]
Debora de Chiusole, Luca Stefanutti, Pasquale Anselmi, and Egidio Robusto. 2018. Testing the actual equivalence of automatically generated items. Behavior Research Methods 50, 1 (Feb. 2018), 39--56. https://doi.org/10.3758/s13428-017- 1004--5
[13]
Stephen H. Edwards and Manuel A. Perez-Quinones. 2008. Web-CAT: Automatically Grading Programming Assignments. In Proceedings of the 13th Annual Conference on Innovation and Technology in Computer Science Education (Madrid, Spain) (ITiCSE '08). Association for Computing Machinery, New York, NY, USA, 328. https://doi.org/10.1145/1384271.1384371
[14]
Chinedu Emeka and Craig Zilles. 2020. Student Perceptions of Fairness and Security in a Versioned Programming Exam. In Proceedings of the 2020 ACM Conference on International Computing Education Research (Virtual Event, New Zealand) (ICER '20). Association for Computing Machinery, New York, NY, USA, 25--35. https://doi.org/10.1145/3372782.3406275
[15]
Lena Feinman. 2018. Alternative to Proctoring in Introductory Statistics Community College Courses. Walden Dissertations and Doctoral Studies (Jan. 2018). https://scholarworks.waldenu.edu/dissertations/4622
[16]
Oscar E. Fernandez. 2021. Second Chance Grading: An Equitable, Meaningful, and Easy-to-Implement Grading System that Synergizes the Research on Testing for Learning, Mastery Grading, and Growth Mindsets. PRIMUS 31, 8 (2021), 855--868. https://doi.org/10.1080/10511970.2020.1772915 arXiv:https://doi.org/10.1080/10511970.2020.1772915
[17]
Daniel T. Fokum, Daniel N. Coore, Eyton Ferguson, Gunjan Mansingh, and Carl Beckford. 2019. Student Performance in Computing Courses in the Face of Growing Enrollments. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE '19). Association for Computing Machinery, New York, NY, USA, 43--48. https://doi.org/10.1145/ 3287324.3287354
[18]
Max Fowler and Craig Zilles. 2021. Superficial Code-Guise: Investigating the Impact of Surface Feature Changes on Students' Programming Question Scores. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (Virtual Event, USA) (SIGCSE '21). Association for Computing Machinery, New York, NY, USA, 3--9. https://doi.org/10.1145/3408877.3432413
[19]
M.J Gierl and T.M Haladyna (Eds.). 2012. Automatic Item Generation: Theory and Practice (1 ed.). Routledge. https://doi.org/10.4324/9780203803912
[20]
Gwo-Jen Hwang, Hui-Chun Chu, Peng-Yeng Yin, and Ji-Yu Lin. 2008. An innovative parallel test sheet composition approach to meet multiple assessment criteria for national tests. Computers & Education 51, 3 (2008), 1058--1072. https://doi.org/10.1016/j.compedu.2007.10.006
[21]
Jason W. Morphew, Mariana Silva, Geoffrey Herman, and Matthew West. 2020. Frequent mastery testing with second-chance exams leads to enhanced student learning in undergraduate engineering. Applied Cognitive Psychology 34, 1 (2020), 168--181. https://doi.org/10.1002/acp.3605
[22]
Neil P. Morris, Mariya Ivancheva, Taryn Coop, Rada Mogliacci, and Bronwen Swinnerton. 2020. Negotiating growth of online education in higher education. International Journal of Educational Technology in Higher Education 17, 1 (Nov. 2020), 48. https://doi.org/10.1186/s41239-020-00227-w
[23]
George Nakos and Anita Whiting. 2018. The role of frequent short exams in improving student performance in hybrid global business classes. Journal of Education for Business 93, 2 (2018), 51--57. https://doi.org/10.1080/08832323.2017. 1417231 arXiv:https://doi.org/10.1080/08832323.2017.1417231
[24]
Shailendra Palvia, Prageet Aeron, Parul Gupta, Diptiranjan Mahapatra, Ratri Parida, Rebecca Rosner, and Sumita Sindhi. 2018. Online Education: Worldwide Status, Challenges, Trends, and Implications. Journal of Global Information Technology Management 21, 4 (2018), 233--241. https://doi.org/10.1080/1097198X. 2018.1542262 arXiv:https://doi.org/10.1080/1097198X.2018.1542262
[25]
Matthew West Paras Sud and Craig Zilles. 2019. Reducing Difficulty Variance in Randomized Assessments. In 2019 ASEE Annual Conference & Exposition. ASEE Conferences, Tampa, Florida. https://doi.org/10.18260/1--2--33228
[26]
Gili Rusak and Lisa Yan. 2021. Unique Exams: Designing Assessments for Integrity and Fairness. Association for Computing Machinery, New York, NY, USA, 1170--1176. https://doi.org/10.1145/3408877.3432556
[27]
Mehran Sahami and Chris Piech. 2016. As CS Enrollments Grow, Are We Attracting Weaker Students?. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, Tennessee, USA) (SIGCSE '16). Association for Computing Machinery, New York, NY, USA, 54--59. https: //doi.org/10.1145/2839509.2844621
[28]
Mohammad A Sarrayrih and Mohammed Ilyas. 2013. Challenges of online exam, performances and problems for online university exam. International Journal of Computer Science Issues (IJCSI) 10, 1 (2013), 439.
[29]
Michael Scott, Tim Stelzer, and Gary Gladding. 2006. Evaluating multiple-choice exams in large introductory physics courses. Physical Review Special Topics - Physics Education Research 2, 2 (July 2006), 020102. https://doi.org/10.1103/ PhysRevSTPER.2.020102
[30]
Mariana Silva, Matthew West, and Craig Zilles. 2020. Measuring the Score Advantage on Asynchronous Exams in an Undergraduate CS Course. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (Portland, OR, USA) (SIGCSE '20). Association for Computing Machinery, New York, NY, USA, 873--879. https://doi.org/10.1145/3328778.3366859
[31]
Michael P. Watters, Paul J. Robertson, and Renae K. Clark. 2011. Student Perceptions of Cheating in Online Business Courses. Journal of Instructional Pedagogies 6 (Sept. 2011). https://eric.ed.gov/?id=EJ1097041
[32]
Matthew West, Geoffrey L. Herman, and Craig Zilles. 2015. PrairieLearn: Mastery-based Online Problem Solving with Adaptive Scoring and Recommendations Driven by Machine Learning. 26.1238.1--26.1238.14. https://peer.asee.org/prairielearn-mastery-based-online-problem-solvingwith-adaptive-scoring-and-recommendations-driven-by-machine-learning

Cited By

View all
  • (2023)Beyond Question Shuffling: Randomization Techniques in Programming Assessment2023 IEEE Frontiers in Education Conference (FIE)10.1109/FIE58773.2023.10342976(1-9)Online publication date: 18-Oct-2023
  • (2022)Techniques for detecting and deterring cheating in home exams in programming2022 IEEE Frontiers in Education Conference (FIE)10.1109/FIE56618.2022.9962547(1-8)Online publication date: 8-Oct-2022

Index Terms

  1. Are We Fair?: Quantifying Score Impacts of Computer Science Exams with Randomized Question Pools

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1
    February 2022
    1049 pages
    ISBN:9781450390705
    DOI:10.1145/3478431
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. assessment
    2. exam generation
    3. fairness
    4. randomized exams

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGCSE 2022
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

    Upcoming Conference

    SIGCSE TS 2025
    The 56th ACM Technical Symposium on Computer Science Education
    February 26 - March 1, 2025
    Pittsburgh , PA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)157
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Beyond Question Shuffling: Randomization Techniques in Programming Assessment2023 IEEE Frontiers in Education Conference (FIE)10.1109/FIE58773.2023.10342976(1-9)Online publication date: 18-Oct-2023
    • (2022)Techniques for detecting and deterring cheating in home exams in programming2022 IEEE Frontiers in Education Conference (FIE)10.1109/FIE56618.2022.9962547(1-8)Online publication date: 8-Oct-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media