research-article

Comparing estimates of difficulty of programming constructs

Authors:
Morten Bastian

Kiel University, Germany

Kiel University, Germany

0000-0002-3102-0886
View Profile

,
Andreas Mühling

Leibniz Institute for Science and Mathematics Education, Germany and Kiel University, Germany

Leibniz Institute for Science and Mathematics Education, Germany and Kiel University, Germany

0000-0002-6396-3491
View Profile

Koli Calling '22: Proceedings of the 22nd Koli Calling International Conference on Computing Education ResearchNovember 2022Article No.: 3Pages 1–12https://doi.org/10.1145/3564721.3565950

Published:17 November 2022Publication History

Koli Calling '22: Proceedings of the 22nd Koli Calling International Conference on Computing Education Research

Pages 1–12

ABSTRACT

Designing assessments in classroom contexts or having them generated automatically requires - among other things - knowledge about the difficulty of what is assessed. Estimates of difficulty can be derived empirically, usually by piloting items, or theoretically from models. Empirical results, in turn, can inform theory and refine models. In this article, we compare four methods of estimating the item difficulty for a typical topic of introductory programming courses: control flow. For a given set of items that have been tested empirically, we also collected expert ratings and additionally applied measures of code complexity both from software engineering and from computer science education research The results show that there is some overlap between empirical results and theoretical predictions. However, for the simple item format that we have been using, the models all fall short in offering enough explanatory power regarding the observed variance in difficulty. Empirical difficulty in turn can serve as the basis for rules that can be used for item generation in the future.

Supplemental Material

Available for Download

pdf

Supplementary material of: Comparing estimates of difficulty of programming constructs Contains: Visual presentation of all 9 items and their tree constructions (applied CCCP-Framework) in a visual and textual form. (618.3 KB)

pdf

Supplementary material of: Comparing estimates of difficulty of programming constructs Contains visual presentations of all nine items of the used test. In addition the material contains visual and textuel presentations of the applied theoretical framework (CCCP-Framework). (618.3 KB)

References

Shulamyt Ajami, Yonatan Woodbridge, and Dror G. Feitelson. 2017. Syntax, Predicates, Idioms - What Really Affects Code Complexity?. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). 66–76. https://doi.org/10.1109/ICPC.2017.39Google ScholarDigital Library
David Andrich and Ida Marais. 2019. A course in Rasch measurement theory. D. Andrich y I. Marais (Coords.), Measuring in the Educational, Social and Health Sciences (2019), 41–53.Google Scholar
Abdul Baist and Aan Subhan Pamungkas. 2017. Analysis of Student Difficulties in Computer Programming. Volt: Jurnal Ilmiah Pendidikan Teknik Elektro 2 (2017), 81–92.Google ScholarCross Ref
Morten Bastian and Andreas Mühling. 2019. Let’s Look a Layer Deeper: Design and First Results of a New Test System in the Context of Program Tracing. In Proceedings of the 19th Koli Calling International Conference on Computing Education Research. 1–2.Google ScholarDigital Library
Christian P Brackmann, Marcos Román-González, Gregorio Robles, Jesús Moreno-León, Ana Casali, and Dante Barone. 2017. Development of computational thinking skills through unplugged activities in primary school. In Proceedings of the 12th workshop on primary and secondary computing education. 65–72.Google ScholarDigital Library
Philip Sheridan Buffum, Eleni V. Lobene, Megan Hardy Frankosky, Kristy Elizabeth Boyer, Eric N. Wiebe, and James C. Lester. 2015. A Practical Guide to Developing and Validating Computer Science Knowledge Assessments with Application to Middle School. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education(SIGCSE ’15). ACM, New York, NY, USA, 622–627. https://doi.org/10.1145/2676723.2677295Google ScholarDigital Library
Tony Clear, Jenny Edwards, Raymond Lister, Beth Simon, Errol Thompson, and Jacqueline Whalley. 2008. The Teaching of Novice Computer Programmers: Bringing the Scholarly-Research Approach to Australia. In Proceedings of the Tenth Conference on Australasian Computing Education - Volume 78(ACE ’08). Australian Computer Society, Inc, AUS, 63–68.Google Scholar
Michael Lamport Commons and Alexander Pekker. 2008. Presenting the formal theory of hierarchical complexity. World Futures 64, 5-7 (2008), 375–382.Google ScholarCross Ref
Laura E De Ruiter and Marina U Bers. 2021. The Coding Stages Assessment: development and validation of an instrument for assessing young children’s proficiency in the ScratchJr programming language. Computer Science Education(2021), 1–30.Google Scholar
Rodrigo Duran, Juha Sorva, and Sofia Leite. 2018. Towards an analysis of program complexity from a cognitive perspective. In Proceedings of the 2018 ACM Conference on International Computing Education Research. 21–30.Google ScholarDigital Library
Rodrigo Duran, Juha Sorva, and Otto Seppälä. 2021. Rules of program behavior. ACM Transactions on Computing Education (TOCE) 21, 4 (2021), 1–37.Google ScholarDigital Library
Michael Eid and Katharina Schmidt. 2014. Testtheorie und Testkonstruktion. Hogrefe Verlag.Google Scholar
Javier Ferrer, Francisco Chicano, and Enrique Alba. 2013. Estimating software testing complexity. Information and Software Technology 55, 12 (2013), 2125–2139.Google ScholarDigital Library
Brian D Gane, Maya Israel, Noor Elagha, Wei Yan, Feiya Luo, and James W Pellegrino. 2021. Design and validation of learning trajectory-based assessments for computational thinking in upper elementary grades. Computer Science Education 31, 2 (2021), 141–168.Google ScholarCross Ref
Marcos Román González. 2015. Computational thinking test: Design guidelines and content validation. In Proceedings of EDULEARN15 conference. 2436–2444.Google Scholar
Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures 1, 1 (2007), 77–89.Google Scholar
Robert T. Hughes. 1996. Expert judgement as an estimating method. Information and software technology 38, 2 (1996), 67–75.Google Scholar
Cruz Izu, Carsten Schulte, Ashish Aggarwal, Quintin Cutts, Rodrigo Duran, Mirela Gutica, Birte Heinemann, Eileen Kraemer, Violetta Lonati, Claudio Mirolo, and Renske Weeda. 2019. Fostering Program Comprehension in Novice Programmers - Learning Activities and Learning Trajectories. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education (Aberdeen, Scotland Uk) (ITiCSE-WGR ’19). Association for Computing Machinery, New York, NY, USA, 27–52. https://doi.org/10.1145/3344429.3372501Google ScholarDigital Library
Philipp Kather, Rodrigo Duran, and Jan Vahrenhold. 2021. Through (Tracking) Their Eyes: Abstraction and Complexity in Program Comprehension. ACM Trans. Comput. Educ. 22, 2, Article 17 (nov 2021), 33 pages. https://doi.org/10.1145/3480171Google ScholarDigital Library
Bernhard Katzmarski and Rainer Koschke. 2012. Program complexity metrics and programmer opinions. In 2012 20th IEEE International Conference on Program Comprehension (ICPC). 17–26. https://doi.org/10.1109/ICPC.2012.6240486Google ScholarCross Ref
M Konecki. 2014. Problems in programming education and means of their improvement. DAAAM international scientific book 2014 (2014), 459–470.Google Scholar
Raymond Lister, Elizabeth S. Adams, Sue Fitzgerald, William Fone, John Hamer, Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto Seppälä, 2004. A multi-national study of reading and tracing skills in novice programmers. ACM SIGCSE Bulletin 36, 4 (2004), 119–150.Google ScholarDigital Library
Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the fourth international workshop on computing education research. 101–112.Google ScholarDigital Library
Andrew Luxton-Reilly, Brett A. Becker, Yingjun Cao, Roger McDermott, Claudio Mirolo, Andreas Mühling, Andrew Petersen, Kate Sanders, Simon, and Jacqueline L. Whalley. 2017. Developing Assessments to Determine Mastery of Programming Fundamentals. In Proceedings of the 2017 ITiCSE Conference on Working Group Reports(ITiCSE-WGR ’17). ACM, New York, NY, USA, 47–69. https://doi.org/10.1145/3174781.3174784Google ScholarDigital Library
Andrew Luxton-Reilly and Andrew Petersen. 2017. The Compound Nature of Novice Programming Assessments. In Proceedings of the Nineteenth Australasian Computing Education Conference (ACE 2017)(ICPS), Donna Teague and Raina Mason (Eds.). The Association for Computing Machinery, New York, New York, 26–35. https://doi.org/10.1145/3013499.3013500Google ScholarDigital Library
Jack E Matson, Bruce E Barrett, and Joseph M Mellichamp. 1994. Software development cost estimation using function points. IEEE Transactions on Software Engineering 20, 4 (1994), 275–287.Google ScholarDigital Library
Andreas Mühling, Alexander Ruf, and Peter Hubwieser. 2015. Design and first results of a psychometric test for measuring basic programming abilities. In Proceedings of the workshop in primary and secondary computing education. 2–10.Google ScholarDigital Library
Greg L. Nelson, Andrew Hu, Benjamin Xie, and Amy J. Ko. 2019. Towards Validity for a Formative Assessment for Language-Specific Program Tracing Skills. In Proceedings of the 19th Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling ’19). Association for Computing Machinery, New York, NY, USA, Article 20, 10 pages. https://doi.org/10.1145/3364510.3364525Google ScholarDigital Library
Jun Rangie C. Obispo and Francisco Enrique Vicente Castro. 2018. Incidence of Einstellung Effect among Programming Students and its Relationship with Achievement.Google Scholar
Miranda C. Parker, Leiny Garcia, Yvonne S. Kao, Diana Franklin, Susan Krause, and Mark Warschauer. 2022. A Pair of ACES: An Analysis of Isomorphic Questions on an Elementary Computing Assessment. In Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1 (Lugano and Virtual Event, Switzerland) (ICER ’22). Association for Computing Machinery, New York, NY, USA, 2–14. https://doi.org/10.1145/3501385.3543979Google ScholarDigital Library
Roy D. Pea. 1986. Language-Independent Conceptual “Bugs” in Novice Programming. Journal of Educational Computing Research 2, 1 (1986), 25–36. https://doi.org/10.2190/689T-1R2A-X4W4-29J2Google ScholarCross Ref
D. N. Perkins, Chris Hancock, Renee Hobbs, Fay Martin, and Rebecca Simmons. 1986. Conditions of Learning in Novice Programmers. Journal of Educational Computing Research 2, 1 (1986), 37–55. https://doi.org/10.2190/GUJT-JCBJ-Q6QU-Q9PLGoogle ScholarCross Ref
Marcos Román-González, Juan-Carlos Pérez-González, and Carmen Jiménez-Fernández. 2017. Which cognitive abilities underlie computational thinking? Criterion validity of the Computational Thinking Test. Computers in human behavior 72 (2017), 678–691.Google Scholar
Jean Salac and Diana Franklin. 2020. If they build it, will they understand it? Exploring the relationship between student code and performance. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education. 473–479.Google ScholarDigital Library
Kate Sanders, Marzieh Ahmadzadeh, Tony Clear, Stephen H. Edwards, Mikey Goldweber, Chris Johnson, Raymond Lister, Robert McCartney, Elizabeth Patitsas, and Jaime Spacco. 2013. The Canterbury QuestionBank: Building a Repository of Multiple-Choice CS1 and CS2 Questions. In Proceedings of the ITiCSE Working Group Reports Conference on Innovation and Technology in Computer Science Education-Working Group Reports (Canterbury, England, United Kingdom) (ITiCSE -WGR ’13). Association for Computing Machinery, New York, NY, USA, 33–52. https://doi.org/10.1145/2543882.2543885Google ScholarDigital Library
Shuhaida Shuhidan, Margaret Hamilton, and Daryl D’Souza. 2009. A Taxonomic Study of Novice Programming Summative Assessment. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95(ACE ’09). Australian Computer Society, Inc, AUS, 147–156.Google Scholar
Juha Sorva. 2012. Visual program simulation in introductory programming education: Zugl.: Espoo, Aalto Univ. School of Science, Diss., 2012. Aalto University publication series Doctoral dissertations, Vol. 2012,61. Aalto Univ. School of Science, Espoo.Google Scholar
Allison Elliott Tew. 2010. Assessing fundamental introductory computing concept knowledge in a language independent manner. Georgia Institute of Technology.Google Scholar
Allison Elliott Tew and Mark Guzdial. 2011. The FCS1: a language independent assessment of CS1 knowledge. In Proceedings of the 42nd ACM technical symposium on Computer science education. 111–116.Google ScholarDigital Library
Jacqueline L Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, PK Ajith Kumar, and Christine Prasad. 2006. An Australasian study of reading and comprehension skills in novice programmers, using the Bloom and SOLO taxonomies. In Conferences in Research and Practice in Information Technology Series.Google Scholar
Stelios Xinogalos. 2016. Designing and deploying programming courses: Strategies, tools, difficulties and pedagogy. Education and Information Technologies 21, 3 (2016), 559–588.Google ScholarDigital Library

Index Terms

Comparing estimates of difficulty of programming constructs
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. K-12 education
      2. Student assessment

Recommendations

Item Difficulty Analysis of English Vocabulary Questions
CSEDU 2016: Proceedings of the 8th International Conference on Computer Supported Education

This study investigates the relations between several factors of question items in English vocabulary tests and the corresponding item difficulty. Designing the item difficulty of a test impacts the quality of the test itself. Our goal is suggesting a ...
Read More
A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction
Artificial Intelligence in Education
Abstract
Assessment quality and validity is heavily reliant on the quality of items included in an assessment or test. Difficulty is an essential factor that can determine items and tests’ overall quality. Therefore, item difficulty prediction is extremely ...
Read More
Towards validity for a formative assessment for language-specific program tracing skills
Koli Calling '19: Proceedings of the 19th Koli Calling International Conference on Computing Education Research

Formative assessments can have positive effects on learning, but few exist for computing, even for basic skills such as program tracing. Instead, teachers often rely on overly broad test questions that lack the diagnostic granularity needed to measure ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Koli Calling '22: Proceedings of the 22nd Koli Calling International Conference on Computing Education Research
November 2022
282 pages
ISBN:9781450396165
DOI:10.1145/3564721
Editors:
Ilkka Jormanainen
University of Eastern Finland, Finland
,
Andrew Petersen
University of Toronto Mississauga, Canada
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Assessment
item difficulty
item generation
program tracing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate80of182submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 136
  Total Downloads
- Downloads (Last 12 months)78
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Comparing estimates of difficulty of programming constructs

Koli Calling '22: Proceedings of the 22nd Koli Calling International Conference on Computing Education Research

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Item Difficulty Analysis of English Vocabulary Questions

A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction

Towards validity for a formative assessment for language-specific program tracing skills