Abstract
A large portion of the cost of any software lies in the time spent by developers in understanding a program’s source code before any changes can be undertaken. Measuring program comprehension is not a trivial task. In fact, different studies use self-reported and various psycho-physiological measures as proxies. In this research, we propose a methodology using functional Near Infrared Spectroscopy (fNIRS) and eye tracking devices as an objective measure of program comprehension that allows researchers to conduct studies in environments close to real world settings, at identifier level of granularity. We validate our methodology and apply it to study the impact of lexical, structural, and readability issues on developers’ cognitive load during bug localization tasks. Our study involves 25 undergraduate and graduate students and 21 metrics. Results show that the existence of lexical inconsistencies in the source code significantly increases the cognitive load experienced by participants not only on identifiers involved in the inconsistencies but also throughout the entire code snippet. We did not find statistical evidence that structural inconsistencies increase the average cognitive load that participants experience, however, both types of inconsistencies result in lower performance in terms of time and success rate. Finally, we observe that self-reported task difficulty, cognitive load, and fixation duration do not correlate and appear to be measuring different aspects of task difficulty.
Similar content being viewed by others
Notes
The experiment was approved through a full board review for human subject research from the Institutional Review Board (IRB) at Washington State University (IRB #16113).
References
Abebe SL, Arnaoudova V, Tonella P, Antoniol G, Guéhéneuc YG (2012) Can lexicon bad smells improve fault prediction?. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 235–244
Afergan D, Peck EM, Solovey ET, Jenkins A, Hincks SW, Brown ET, Chang R, Jacob RJ (2014) Dynamic difficulty using brain metrics of workload. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pp 3797–3806
Aghajani E, Nagy C, Bavota G, Lanza M (2018) A large-scale empirical study on linguistic antipatterns affecting apis. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 25–35
Arnaoudova V, Di Penta, M, Antoniol, G, Guéhéneuc Y-G (2013) A new family of software anti-patterns: Linguistic anti-patterns. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), pp 187–196
Arnaoudova V, Di Penta, M, Antoniol G (2016) Linguistic antipatterns: What they are and how developers perceive them. Empir Softw Eng (EMSE) 21(1):104–158
Baker WB, Parthasarathy AB, Busch DR, Mesquita RC, Greenberg JH, Yodh A (2014) Modified beer-lambert law for blood flow. Biomed Opt Express 5 (11):4053–4075
Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Softw Eng (EMSE) 18(2):219–276
Binkley D, Davis M, Lawrie D, Morrell C (2009a) To CamelCase or Under_score. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 158–167
Binkley D, Lawrie D, Maex S, Morrell C (2009b) Identifier length and limited programmer memory. Sci Comput Program 74(7):430–445
BIOPAC (2018a) Biopac homepage, https://www.biopac.com
BIOPAC (2018b) fnirsoft user manual, https://www.biopac.com/wp-content/uploads/fnirsoft-user-manual.pdf
Blackwell AF (2006) Metaphors we program by: space, action and society in java. In: PPIG, pp 8
Buse RP, Weimer W (2010) Learning a metric for code readability. IEEE Trans Softw Eng (TSE) 36(4):546–558
Butler S, Wermelinger M, Yu Y, Sharp H (2009) Relating identifier naming flaws and code quality: an empirical study. In: 2009 16Th working conference on reverse engineering. IEEE, pp 31–35
Castelhano J, Duarte IC, Ferreira C, Duraes J, Madeira H, Castelo-Branco M (2018) The role of the insula in intuitive expert bug detection in computer code: an fmri study. Brain Imaging and Behavior, pp 1–15
Causse M, Chua Z, Peysakhovich V, Del Campo N, Matton N (2017) Mental workload and neural efficiency quantified in the prefrontal cortex using fnirs. Sci Rep 7(1):5222
Deissenboeck F, Pizka M (2006) Concise and consistent naming. Softw Qual J 14(3):261–282
Delpy DT, Cope M, van der Zee P, Arridge S, Wray S, Wyatt J (1988) Estimation of optical pathlength through tissue from direct time of flight measurement. Phys Med Biol 33(12):1433
Duraes J, Madeira H, Castelhano J, Duarte C, Branco MC (2016) Wap: Understanding the brain at software debugging. In: 2016 IEEE 27Th international symposium on software reliability engineering (ISSRE). IEEE, pp 87–92
Eclipse (2018) Eclipse ide, https://www.eclipse.org/ide
Ehlis A-C, Schneider S, Dresler T, Fallgatter AJ (2014) Application of functional near-infrared spectroscopy in psychiatry. Neuroimage 85:478–488
EyeTribe (2018) The eye tribe homepage, https://theeyetribe.com
Fakhoury S (2018a) Online replication package, https://github.com/smfakhoury/fNIRS-and-cognitive-load
Fakhoury S, Ma Y, Arnaoudova V, Adesope O (2018b) The effect of poor source code lexicon and readability on developers’ cognitive load
Fishburn FA, Norr ME, Medvedev AV, Vaidya CJ (2014) Sensitivity of fnirs to cognitive state and load. Front Hum Neurosci 8:76
Floyd B, Santander T, Weimer W (2017) Decoding the representation of code in the brain: an fmri study of code review and expertise. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 175–186
Fritz T, Begel A, Muller SC, Yigit-Elliott S, Zuger M (2014) Using psycho-physiological measures to assess task difficulty in software development. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 402–413
Girouard A, Solovey ET, Hirshfield LM, Chauncey K, Sassaroli A, Fantini S, Jacob RJ (2009) Distinguishing difficulty levels with non-invasive brain activity measurements. In: IFIP Conference on human-computer interaction. Springer, pp 440–452
Grissom RJ, Kim JJ (2005) Effect sizes for research: A broad practical approach, 2nd edn. Lawrence Earlbaum Associates
Halstead MH (1977) Elements of software science
Herff C, Heger D, Fortmann O, Hennrich J, Putze F, Schultz T (2014) Mental workload during n-back task-quantified in the prefrontal cortex using fnirs. Front Hum Neurosci 7:935
Hochstein L, Basili VR, Zelkowitz MV, Hollingsworth JK, Carver J (2005) Combining self-reported and automatic data to improve programming effort measurement. SIGSOFT Softw Eng Notes 30(5):356–365
Ikutani Y, Uwano H (2014) Brain activity measurement during program comprehension with nirs. In: Proceedings of the International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp 1–6
Jaafar F, Guéhéneuc Y-G, Hamel S, Khomh F (2013) Mining the relationship between anti-patterns dependencies and fault-proneness. In: 2013 20Th working conference on reverse engineering (WCRE). IEEE, pp 351–360
Khomh F, Penta MD, Guéhéneuc Y-G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Softw Eng 17(3):243–275
Kruggel F, von Cramon DY (1999) Temporal properties of the hemodynamic response in functional mri. Hum Brain Mapp 8(4):259–271
Lawrie D, Morrell C, Feild H, Binkley D (2006) What’s in a name? A study of identifiers. In: Proceedings of International Conference on Program Comprehension (ICPC), pp 3–12
Lee S, Hooshyar D, Ji H, Nam K, Lim H (2017) Mining biometric data to predict programmer expertise and task difficulty. Clust Comput 21:1–11
Liblit B, Begel A, Sweetser E (2006) Cognitive perspectives on the role of naming in computer programs.. In: PPIG. Citeseer, pp 11
Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng (TSE) 34(2):287–30
McCabe TJ (1976) A complexity measure. IEEE Transactions on software engineering (TSE) SE-2(4):308–320
Muller SC, Fritz T (2016) Using (bio)metrics to predict code quality online. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 452–463
Nakagawa T, Kamei Y, Uwano H, Monden A, Matsumoto K, German DM (2014) Quantifying programmers’ mental workload during program comprehension based on cerebral blood flow measurement: a controlled experiment. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 448–451
Ooms K, Dupont L, Lapon L, Popelka S (2015) Accuracy and precision of fixation locations recorded with the low-cost eye tribe tracker in different experimental setups. J Eye Mov Res 8(1):1–24
Peitek N, Siegmund J, Parnin C, Apel S, Hofmeister J (2018) A Brechmann Simultaneous measurement of program comprehension with fmri and eye tracking: a case study
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 137–148
Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the Working Conference on Mining Software Repositories (MSR), pp 73–82
Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372–422
Salman I, Misirli AT, Juristo N (2015) Are students representatives of professionals in software engineering experiments?. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 666–676
Scanniello G, Risi M (2013) Dealing with faults in source code: Abbreviated vs. full-word identifier names. In: 2013 29th IEEE international conference on Software maintenance (ICSM). IEEE, pp 190–199
Scalabrino S, Linares-Vasquez M, Poshyvanyk D, Oliveto R (2016) Improving code readability models with textual features. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 1–10
Shaffer T, Wise JL, Walters BM, Müller SC, Falcone M, Sharif B (2015) Itrace Enabling eye tracking on software artifacts within the ide to support software engineering tasks. In: Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pp 954–957
Sharafi Z, Soh Z, Guéhéneuc Y-G, Antoniol G (2012) Women and men — different but equal: on the impact of identifier style on source code reading. In: IEEE International conference on program comprehension, pp 27–36
Sharafi Z, Shaffer T, Sharif B, Guéhéneuc Y-G (2015a) Eye-tracking metrics in software engineering. In: 2015 Asia-pacific software engineering conference (APSEC). IEEE, pp 96–103
Sharafi Z, Soh Z, Guéhéneuc Y-G (2015b) A systematic literature review on the usage of eye-tracking in software engineering. Inf Softw Technol 67:79–107
Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), pp 381–384
Siegmund J, Kastner C, Apel S, Parnin C, Bethmann A, Leich T, Saake G, Brechmann A (2014) Understanding understanding source code with functional magnetic resonance imaging. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 378–389
Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kastner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pp 140–150
Sokal RR (1958) A statistical method for evaluating systematic relationship. Univ Kansas Sci Bullet 28:1409–1438
Takang AA, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Prog Lang 4(3):143–167
Treacy Solovey E, Afergan D, Peck EM, Hincks SW, Jacob RJK (2015) Designing Implicit Interfaces for Physiological Computing. ACM Trans Comput-Hum Interact 21(6):1–27
Wohlin C, Runeson P, Martin H, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering - an introduction. Kluwer Academic Publishers, Norwell
Yin RK (1994) Case Study Research: Design and Methods, 2nd edn. Sage Publications, New York
Acknowledgements
This work is supported by the NSF (award number CCF-1755995). The authors thank Thom Hemenway, Keon Sadatian, Nehemiah Salo, and Kyle Tilton for their help in developing tools for the environment in which we conducted the experiment. We also thank all students that participated in the experiment for their time and effort.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by: Chanchal Roy, Janet Siegmund, and David Lo
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is an extension of our previous paper (Fakhoury et al. 2018b).
Rights and permissions
About this article
Cite this article
Fakhoury, S., Roy, D., Ma, Y. et al. Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization. Empir Software Eng 25, 2140–2178 (2020). https://doi.org/10.1007/s10664-019-09751-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09751-4