Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization

Fakhoury, Sarah; Roy, Devjeet; Ma, Yuzhan; Arnaoudova, Venera; Adesope, Olusola

doi:10.1007/s10664-019-09751-4

Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization

Published: 08 August 2019

Volume 25, pages 2140–2178, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Sarah Fakhoury¹,
Devjeet Roy¹,
Yuzhan Ma²,
Venera Arnaoudova¹ &
…
Olusola Adesope³

1234 Accesses
20 Citations
Explore all metrics

Abstract

A large portion of the cost of any software lies in the time spent by developers in understanding a program’s source code before any changes can be undertaken. Measuring program comprehension is not a trivial task. In fact, different studies use self-reported and various psycho-physiological measures as proxies. In this research, we propose a methodology using functional Near Infrared Spectroscopy (fNIRS) and eye tracking devices as an objective measure of program comprehension that allows researchers to conduct studies in environments close to real world settings, at identifier level of granularity. We validate our methodology and apply it to study the impact of lexical, structural, and readability issues on developers’ cognitive load during bug localization tasks. Our study involves 25 undergraduate and graduate students and 21 metrics. Results show that the existence of lexical inconsistencies in the source code significantly increases the cognitive load experienced by participants not only on identifiers involved in the inconsistencies but also throughout the entire code snippet. We did not find statistical evidence that structural inconsistencies increase the average cognitive load that participants experience, however, both types of inconsistencies result in lower performance in terms of time and success rate. Finally, we observe that self-reported task difficulty, cognitive load, and fixation duration do not correlate and appear to be measuring different aspects of task difficulty.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

How Novices Read Source Code in Introductory Courses on Programming: An Eye-Tracking Experiment

Do attention and memory explain the performance of software developers?

Article 26 August 2023

Valentina Piantadosi, Simone Scalabrino, … Rocco Oliveto

Measuring code comprehension effort using code reading pattern

Article 17 June 2022

Sayani Mondal, Partha Pratim Das & Titas Bhattacharjee Rudra

Notes

The experiment was approved through a full board review for human subject research from the Institutional Review Board (IRB) at Washington State University (IRB #16113).

References

Abebe SL, Arnaoudova V, Tonella P, Antoniol G, Guéhéneuc YG (2012) Can lexicon bad smells improve fault prediction?. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 235–244
Afergan D, Peck EM, Solovey ET, Jenkins A, Hincks SW, Brown ET, Chang R, Jacob RJ (2014) Dynamic difficulty using brain metrics of workload. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pp 3797–3806
Aghajani E, Nagy C, Bavota G, Lanza M (2018) A large-scale empirical study on linguistic antipatterns affecting apis. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 25–35
Arnaoudova V, Di Penta, M, Antoniol, G, Guéhéneuc Y-G (2013) A new family of software anti-patterns: Linguistic anti-patterns. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), pp 187–196
Arnaoudova V, Di Penta, M, Antoniol G (2016) Linguistic antipatterns: What they are and how developers perceive them. Empir Softw Eng (EMSE) 21(1):104–158
Article Google Scholar
Baker WB, Parthasarathy AB, Busch DR, Mesquita RC, Greenberg JH, Yodh A (2014) Modified beer-lambert law for blood flow. Biomed Opt Express 5 (11):4053–4075
Article Google Scholar
Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Softw Eng (EMSE) 18(2):219–276
Article Google Scholar
Binkley D, Davis M, Lawrie D, Morrell C (2009a) To CamelCase or Under_score. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 158–167
Binkley D, Lawrie D, Maex S, Morrell C (2009b) Identifier length and limited programmer memory. Sci Comput Program 74(7):430–445
Article MathSciNet Google Scholar
BIOPAC (2018a) Biopac homepage, https://www.biopac.com
BIOPAC (2018b) fnirsoft user manual, https://www.biopac.com/wp-content/uploads/fnirsoft-user-manual.pdf
Blackwell AF (2006) Metaphors we program by: space, action and society in java. In: PPIG, pp 8
Buse RP, Weimer W (2010) Learning a metric for code readability. IEEE Trans Softw Eng (TSE) 36(4):546–558
Article Google Scholar
Butler S, Wermelinger M, Yu Y, Sharp H (2009) Relating identifier naming flaws and code quality: an empirical study. In: 2009 16Th working conference on reverse engineering. IEEE, pp 31–35
Castelhano J, Duarte IC, Ferreira C, Duraes J, Madeira H, Castelo-Branco M (2018) The role of the insula in intuitive expert bug detection in computer code: an fmri study. Brain Imaging and Behavior, pp 1–15
Causse M, Chua Z, Peysakhovich V, Del Campo N, Matton N (2017) Mental workload and neural efficiency quantified in the prefrontal cortex using fnirs. Sci Rep 7(1):5222
Article Google Scholar
Deissenboeck F, Pizka M (2006) Concise and consistent naming. Softw Qual J 14(3):261–282
Article Google Scholar
Delpy DT, Cope M, van der Zee P, Arridge S, Wray S, Wyatt J (1988) Estimation of optical pathlength through tissue from direct time of flight measurement. Phys Med Biol 33(12):1433
Article Google Scholar
Duraes J, Madeira H, Castelhano J, Duarte C, Branco MC (2016) Wap: Understanding the brain at software debugging. In: 2016 IEEE 27Th international symposium on software reliability engineering (ISSRE). IEEE, pp 87–92
Eclipse (2018) Eclipse ide, https://www.eclipse.org/ide
Ehlis A-C, Schneider S, Dresler T, Fallgatter AJ (2014) Application of functional near-infrared spectroscopy in psychiatry. Neuroimage 85:478–488
Article Google Scholar
EyeTribe (2018) The eye tribe homepage, https://theeyetribe.com
Fakhoury S (2018a) Online replication package, https://github.com/smfakhoury/fNIRS-and-cognitive-load
Fakhoury S, Ma Y, Arnaoudova V, Adesope O (2018b) The effect of poor source code lexicon and readability on developers’ cognitive load
Fishburn FA, Norr ME, Medvedev AV, Vaidya CJ (2014) Sensitivity of fnirs to cognitive state and load. Front Hum Neurosci 8:76
Article Google Scholar
Floyd B, Santander T, Weimer W (2017) Decoding the representation of code in the brain: an fmri study of code review and expertise. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 175–186
Fritz T, Begel A, Muller SC, Yigit-Elliott S, Zuger M (2014) Using psycho-physiological measures to assess task difficulty in software development. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 402–413
Girouard A, Solovey ET, Hirshfield LM, Chauncey K, Sassaroli A, Fantini S, Jacob RJ (2009) Distinguishing difficulty levels with non-invasive brain activity measurements. In: IFIP Conference on human-computer interaction. Springer, pp 440–452
Grissom RJ, Kim JJ (2005) Effect sizes for research: A broad practical approach, 2nd edn. Lawrence Earlbaum Associates
Halstead MH (1977) Elements of software science
Herff C, Heger D, Fortmann O, Hennrich J, Putze F, Schultz T (2014) Mental workload during n-back task-quantified in the prefrontal cortex using fnirs. Front Hum Neurosci 7:935
Article Google Scholar
Hochstein L, Basili VR, Zelkowitz MV, Hollingsworth JK, Carver J (2005) Combining self-reported and automatic data to improve programming effort measurement. SIGSOFT Softw Eng Notes 30(5):356–365
Article Google Scholar
Ikutani Y, Uwano H (2014) Brain activity measurement during program comprehension with nirs. In: Proceedings of the International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp 1–6
Jaafar F, Guéhéneuc Y-G, Hamel S, Khomh F (2013) Mining the relationship between anti-patterns dependencies and fault-proneness. In: 2013 20Th working conference on reverse engineering (WCRE). IEEE, pp 351–360
Khomh F, Penta MD, Guéhéneuc Y-G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Softw Eng 17(3):243–275
Article Google Scholar
Kruggel F, von Cramon DY (1999) Temporal properties of the hemodynamic response in functional mri. Hum Brain Mapp 8(4):259–271
Article Google Scholar
Lawrie D, Morrell C, Feild H, Binkley D (2006) What’s in a name? A study of identifiers. In: Proceedings of International Conference on Program Comprehension (ICPC), pp 3–12
Lee S, Hooshyar D, Ji H, Nam K, Lim H (2017) Mining biometric data to predict programmer expertise and task difficulty. Clust Comput 21:1–11
Liblit B, Begel A, Sweetser E (2006) Cognitive perspectives on the role of naming in computer programs.. In: PPIG. Citeseer, pp 11
Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng (TSE) 34(2):287–30
Article Google Scholar
McCabe TJ (1976) A complexity measure. IEEE Transactions on software engineering (TSE) SE-2(4):308–320
Article MathSciNet Google Scholar
Muller SC, Fritz T (2016) Using (bio)metrics to predict code quality online. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 452–463
Nakagawa T, Kamei Y, Uwano H, Monden A, Matsumoto K, German DM (2014) Quantifying programmers’ mental workload during program comprehension based on cerebral blood flow measurement: a controlled experiment. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 448–451
Ooms K, Dupont L, Lapon L, Popelka S (2015) Accuracy and precision of fixation locations recorded with the low-cost eye tribe tracker in different experimental setups. J Eye Mov Res 8(1):1–24
Peitek N, Siegmund J, Parnin C, Apel S, Hofmeister J (2018) A Brechmann Simultaneous measurement of program comprehension with fmri and eye tracking: a case study
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 137–148
Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the Working Conference on Mining Software Repositories (MSR), pp 73–82
Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372–422
Article Google Scholar
Salman I, Misirli AT, Juristo N (2015) Are students representatives of professionals in software engineering experiments?. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 666–676
Scanniello G, Risi M (2013) Dealing with faults in source code: Abbreviated vs. full-word identifier names. In: 2013 29th IEEE international conference on Software maintenance (ICSM). IEEE, pp 190–199
Scalabrino S, Linares-Vasquez M, Poshyvanyk D, Oliveto R (2016) Improving code readability models with textual features. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 1–10
Shaffer T, Wise JL, Walters BM, Müller SC, Falcone M, Sharif B (2015) Itrace Enabling eye tracking on software artifacts within the ide to support software engineering tasks. In: Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pp 954–957
Sharafi Z, Soh Z, Guéhéneuc Y-G, Antoniol G (2012) Women and men — different but equal: on the impact of identifier style on source code reading. In: IEEE International conference on program comprehension, pp 27–36
Sharafi Z, Shaffer T, Sharif B, Guéhéneuc Y-G (2015a) Eye-tracking metrics in software engineering. In: 2015 Asia-pacific software engineering conference (APSEC). IEEE, pp 96–103
Sharafi Z, Soh Z, Guéhéneuc Y-G (2015b) A systematic literature review on the usage of eye-tracking in software engineering. Inf Softw Technol 67:79–107
Article Google Scholar
Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), pp 381–384
Siegmund J, Kastner C, Apel S, Parnin C, Bethmann A, Leich T, Saake G, Brechmann A (2014) Understanding understanding source code with functional magnetic resonance imaging. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 378–389
Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kastner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pp 140–150
Sokal RR (1958) A statistical method for evaluating systematic relationship. Univ Kansas Sci Bullet 28:1409–1438
Google Scholar
Takang AA, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Prog Lang 4(3):143–167
Google Scholar
Treacy Solovey E, Afergan D, Peck EM, Hincks SW, Jacob RJK (2015) Designing Implicit Interfaces for Physiological Computing. ACM Trans Comput-Hum Interact 21(6):1–27
Article Google Scholar
Wohlin C, Runeson P, Martin H, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering - an introduction. Kluwer Academic Publishers, Norwell
Book Google Scholar
Yin RK (1994) Case Study Research: Design and Methods, 2nd edn. Sage Publications, New York
Google Scholar

Download references

Acknowledgements

This work is supported by the NSF (award number CCF-1755995). The authors thank Thom Hemenway, Keon Sadatian, Nehemiah Salo, and Kyle Tilton for their help in developing tools for the environment in which we conducted the experiment. We also thank all students that participated in the experiment for their time and effort.

Author information

Authors and Affiliations

SEL Lab, School of EECS, Washington State University, 355 NE Spokane St., Sloan 326, Pullman, WA, 99164-2752, USA
Sarah Fakhoury, Devjeet Roy & Venera Arnaoudova
Amazon, Washington, WA, USA
Yuzhan Ma
College of Education, Washington State University, Pullman, WA, USA
Olusola Adesope

Authors

Sarah Fakhoury
View author publications
You can also search for this author in PubMed Google Scholar
Devjeet Roy
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Venera Arnaoudova
View author publications
You can also search for this author in PubMed Google Scholar
Olusola Adesope
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sarah Fakhoury or Venera Arnaoudova.

Additional information

Communicated by: Chanchal Roy, Janet Siegmund, and David Lo

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is an extension of our previous paper (Fakhoury et al. 2018b).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fakhoury, S., Roy, D., Ma, Y. et al. Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization. Empir Software Eng 25, 2140–2178 (2020). https://doi.org/10.1007/s10664-019-09751-4

Download citation

Published: 08 August 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10664-019-09751-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization

Abstract

Access this article

Similar content being viewed by others

How Novices Read Source Code in Introductory Courses on Programming: An Eye-Tracking Experiment

Do attention and memory explain the performance of software developers?

Measuring code comprehension effort using code reading pattern

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization

Abstract

Access this article

Similar content being viewed by others

How Novices Read Source Code in Introductory Courses on Programming: An Eye-Tracking Experiment

Do attention and memory explain the performance of software developers?

Measuring code comprehension effort using code reading pattern

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation