Skip to main content
Log in

An investigation of misunderstanding code patterns in C open-source software projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Maintenance consumes 40% to 80% of software development costs. So, it is essential to write source code that is easy to understand to reduce the costs with maintenance. Improving code understanding is important because developers often mistake the meaning of code, and misjudge the program behavior, which can lead to errors. There are patterns in source code, such as operator precedence, and comma operator, that have been shown to influence code understanding negatively. Despite initial results, these patterns have not been evaluated in a real-world setting, though. Thus, it is not clear whether developers agree that the patterns studied by researchers can cause substantial misunderstandings in real-world practice. To better understand the relevance of misunderstanding patterns, we applied a mixed research method approach, by performing repository mining and a survey with developers, to evaluate misunderstanding patterns in 50 C open-source projects, including Apache, OpenSSL, and Python. Overall, we found more than 109K occurrences of the 12 patterns in practice. Our study shows that according to developers only some patterns considered previously by researchers may cause misunderstandings. Our results complement previous studies by taking the perception of developers into account.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Pattern 1
Pattern 2
Pattern 3
Pattern 4
Pattern 5
Pattern 6
Pattern 7
Pattern 8
Pattern 9
Pattern 10
Pattern 11
Pattern 12
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://github.com/curl/curl/blob/master/docs/CODE_STYLE.md

  2. http://cpsoftware.com.br/patterns/index.html

  3. http://www.openh264.org/

  4. https://rbcommons.com/s/OpenH264/r/465/diff/1#0

  5. https://developer.mozilla.org/docs/Mozilla/Developer_guide/Coding_Style

  6. https://github.com/google/styleguide

  7. https://www.kernel.org/doc/html/v4.10/process/coding-style.html

  8. http://www.srcml.org/

  9. https://clang-analyzer.llvm.org/

  10. https://www.viva64.com/en/pvs-studio/

  11. http://www.srcml.org/

References

  • Baxter ID (1992) Design maintenance systems. Commun ACM 35(4):73–89

    Article  Google Scholar 

  • Baxter I, Mehlich M (2001) Preprocessor conditional removal by simple partial evaluation. In: Proceedings of the working conference on reverse engineering, IEEE, WCRE, pp 281–290

  • Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: which problems do they fix? In: Proceedings of the working conference on mining software repositories. ACM, pp 202–211

  • Bland M (2014) Finding more than one worm in the apple. Commun ACM 57 (7):58–64

    Article  Google Scholar 

  • Burke D (1995) All Circuits are Busy Now: The 1990 AT&T Long Distance Network Collapse. California Polytechnic State University

  • Buse RP, Weimer WR (2008) A metric for software readability. In: Proceedings of the international symposium on software testing and analysis. ACM, pp 121–130

  • Cannon LW, Elliott RA, Kirchhoff LW, Miller JH, Milner JM, Mitze RW, Schan EP, Whittington NO, Spencer H, Brader M, Cannon LW, Elliott RA, Kirchhoff LW, Miller JH, Milner JM, Mitze RW, Schan EP, Whittington NO, Spencer H, Brader M (2000) Recommended C style and coding standards

  • Collberg C, Thomborson C, Low D (1997) A taxonomy of obfuscating transformations. Technical Report 148, Department of Computer Science. University of Auckland

  • Creswell JW, Clark VLP (2011) Designing and Conducting Mixed Methods Research. SAGE Publications, Thousand Oaks

    Google Scholar 

  • Darnell PA, Margolis PE (1996) C: A Software Engineering Approach. Springer, Berlin

    Book  MATH  Google Scholar 

  • Dijkstra EW (1968) Go to statement considered harmful. Commun ACM 11 (3):147–148

    Article  MathSciNet  Google Scholar 

  • Dowson M (1997) The Ariane 5 software failure. SIGSOFT Softw Eng Notes 22 (2):84–93

    Article  Google Scholar 

  • Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. Springer, Berlin, pp 285–311

    Google Scholar 

  • Elgot CC (1976) Structured programming with and without go to statements. IEEE Trans Softw Eng SE-2(1):41–54

    Article  MathSciNet  MATH  Google Scholar 

  • Ernst M, Badros G, Notkin D (2002) An empirical analysis of C, preprocessor use. IEEE Trans Softw Eng 28(12):1146–1170

    Article  Google Scholar 

  • Feigenspan J, Kästner C, Apel S, Liebig J, Schulze M, Dachselt R, Papendieck M, Leich T, Saake G (2013) Do background colors improve program comprehension in the #ifdef hell? Empir Softw Eng 18(4):699–745

    Article  Google Scholar 

  • Fowler M, Beck K, Brant J, Opdyke W, Roberts D, Gamma E (1999) Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading

    Google Scholar 

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Garrido A, Johnson R (2003) Refactoring C with conditional compilation. In: Proceedings of the IEEE international conference on automated software engineering. IEEE, pp 323–326

  • Glass RL (2001) Frequently forgotten fundamental facts about software engineering. IEEE Softw 18(3):112–111

    Article  Google Scholar 

  • Gopstein D, Iannacone J, Yan Y, DeLong L, Zhuang Y, Yeh MKC, Cappos J (2017) Understanding misunderstandings in source code. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, ESEC/FSE 2017, pp 129-139

  • Gousios G (2013) The GHTorent dataset and tool suite. In: Proceedings of the working conference on mining software repositories. IEEE Press, pp 233–236

  • Gopstein D, Zhou H, Frankl P, Cappos J (2018) Prevalence of confusing code in software projects: atoms of confusion in the wild. In: Proceedings of the working conference on mining software repositories. ACM

  • Herzberg A, Pinter SS (1987) Public protection of software. ACM Trans Comput Syst 5(4):371–393

    Article  Google Scholar 

  • ISO/IEC/IEEE (2006) Iso/iec/ieee international standard for software engineering - software life cycle processes - maintenance. Std 14764-2006, pp 1–58

  • Jha MM, Vilardell RMF, Narayan J (2016) Scaling agile scrum software development: providing agility and quality to platform development by reducing time to market. In: 2016 IEEE 11th international conference on global software engineering (ICGSE), pp 84–88

  • Kästner C, Giarrusso P, Rendel T, Erdweg S, Ostermann K, Berger T (2011) Variability-aware parsing in the presence of lexical macros and conditional compilation. In: Proceedings of the object-oriented programming systems languages and applications, ACM, pp 805–824

  • Kernighan BW, Pike R (1999) The Practice of Programming. Addison-Wesley, Reading

    Google Scholar 

  • Liebig J, Kästner C, Apel S (2011) Analyzing the discipline of preprocessor annotations in 30 million lines of C code. In: Proceedings of the international conference on aspect-oriented software development. ACM, pp 191–202

  • Lohmann D, Scheler F, Tartler R, Spinczyk O, Schröder-Preikschat W (2006) A quantitative analysis of aspects in the eCos kernel. In: Proceedings of the European conference on computer systems. ACM, pp 191–204

  • Malaquias R, Ribeiro M, Bonifácio R, Monteiro E, Medeiros F, Garcia A, Gheyi R (2017) The discipline of preprocessor-based annotations does #ifdef TAG N’T #endif matter. In: Proceedings of the international conference on program comprehension. IEEE Press, pp 297–307

  • Marshall L, Webber J (2000) Gotos considered harmful and other programmers taboos. In: Proceedings of the workshop of the psychology of programming interest group. PPIG, pp 171–180

  • Medeiros F, Ribeiro M, Gheyi R (2013) Investigating preprocessor-based syntax errors. In: Proceedings of the international conference on generative programming, concepts & experiences. ACM, pp 75–84

  • Medeiros F, Kästner C, Ribeiro M, Nadi S, Gheyi R (2015a) The Love/Hate Relationship with the C Preprocessor: An Interview Study. In: European conference on object-oriented programming (ECOOP), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Leibniz International Proceedings in Informatics (LIPIcs), vol 37, pp 495–518

  • Medeiros F, Rodrigues I, Ribeiro M, Teixeira L, Gheyi R (2015b) An empirical study on configuration-related issues: Investigating undeclared and unused identifiers. In: Proceedings of the ACM SIGPLAN international conference on generative programming, concepts and experiences. ACM, pp 35-44

  • Medeiros F, Kästner C, Ribeiro M, Gheyi R, Apel S (2016) A comparison of 10 sampling algorithms for configurable systems. In: Proceedings of the international conference on software engineering. ACM, pp 643–654

  • Medeiros F, Ribeiro M, Gheyi R, Apel S, Kastner C, Ferreira B, Carvalho L, Fonseca B (2018a) Discipline matters: refactoring of preprocessor directives in the #ifdef hell, vol 44

  • Medeiros F, Silva G, Amaral G, Apel S, Kästner C, Ribeiro M, Gheyi R (2018b) Investigating Misunderstanding Code Patterns in C Open-Source Software Projects (Replication Package). https://doi.org/10.5281/zenodo.1461534

  • Nagappan M, Robbes R, Kamei Y, Tanter E, McIntosh S, Mockus A, Hassan AE (2015) An empirical study of goto in C code from GitHub repositories. In: Proceedings of the joint meeting on foundations of software engineering. ACM, NY, pp 404–414

  • Padioleau Y (2009) Parsing C/C++ code without pre-processing. In: Proceedings of the international conference on compiler construction. Springer, pp 109–125

  • Pahal A, Chillar RS (2017) Code readability: a review of metrics for software quality. Int J Comput Trends Technol 46(1):1–58

    Article  Google Scholar 

  • Rigby PC, German DM, Storey MA (2008) Open source software peer review practices: a case study of the Apache server. In: Proceedings of the international conference on software engineering. ACM, pp 541–550

  • Schulze S, Liebig J, Siegmund J, Apel S (2013) Does the discipline of preprocessor annotations matter? a controlled experiment. In: Proceedings of the international conference on generative programming, concepts and experiences. ACM, pp 65–74

  • Scott ML (2000) Programming language pragmatics. Morgan Kaufmann Publishers Inc., San Francisco

    MATH  Google Scholar 

  • Spencer H, Collyer G (1992) #ifdef considered harmful, or portability experience with C News. In: USENIX summer technical conference, pp 185–197

  • Stamelos I, Angelis L, Oikonomou A, Bleris GL (2002) Code quality analysis in open source software development. Inf Syst J 12(1):43–60

    Article  Google Scholar 

  • Wulf W, Shaw M (1973) Global variable considered harmful. SIGPLAN Not 8(2):28–34

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Dan Gopstein for the useful feedback regarding our study. Apel’s work has been supported by the German Research Foundation (AP 206/6). This work was funded by CNPq (308380/2016-9, 477943/2013-6, 460883/2014-3, 465614/2014-0, 306610/2013-2, 307190/2015-3, and also CNPq 409335/2016-9), FAPEAL (PPG 14/2016), and CAPES grants (175956 and 117875).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Flávio Medeiros.

Additional information

Communicated by: Christoph Treude

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Survey with Developers

Appendix A: Survey with Developers

We are investigating specific C constructions (code patterns) in the source code. This survey presents some code patterns and ask you about their influence in terms of understanding the source code. For each question we will present the code patterns at the Left-Hand Side (LHS) and an alternative on the Right-Hand Side (RHS).

You should be able to answer our survey in around 10-15 minutes. We will use your answers to understand the practical use of code patterns and develop supporting tools. We really appreciate your help. Thanks!

figure x
figure y
figure z
figure aa
figure ab
figure ac
figure ad
figure ae
figure af
figure ag
figure ah
figure ai
figure aj

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Medeiros, F., Lima, G., Amaral, G. et al. An investigation of misunderstanding code patterns in C open-source software projects. Empir Software Eng 24, 1693–1726 (2019). https://doi.org/10.1007/s10664-018-9666-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9666-x

Keywords

Navigation