Skip to main content
Log in

Linguistic antipatterns: what they are and how developers perceive them

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Antipatterns are known as poor solutions to recurring problems. For example, Brown et al. and Fowler define practices concerning poor design or implementation solutions. However, we know that the source code lexicon is part of the factors that affect the psychological complexity of a program, i.e., factors that make a program difficult to understand and maintain by humans. The aim of this work is to identify recurring poor practices related to inconsistencies among the naming, documentation, and implementation of an entity—called Linguistic Antipatterns (LAs)—that may impair program understanding. To this end, we first mine examples of such inconsistencies in real open-source projects and abstract them into a catalog of 17 recurring LAs related to methods and attributes. Then, to understand the relevancy of LAs, we perform two empirical studies with developers—30 external (i.e., not familiar with the code) and 14 internal (i.e., people developing or maintaining the code). Results indicate that the majority of the participants perceive LAs as poor practices and therefore must be avoided—69 % and 51 % of the external and internal developers, respectively. As further evidence of LAs’ validity, open source developers that were made aware of LAs reacted to the issue by making code changes in 10 % of the cases. Finally, in order to facilitate the use of LAs in practice, we identified a subset of LAs which were universally agreed upon as being problematic; those which had a clear dissonance between code behavior and lexicon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. http://cocoon.apache.org

  2. http://www.eclipse.org

  3. http://www.veneraarnaoudova.ca/tools

  4. http://eclipse-cs.sourceforge.net/

  5. http://checkstyle.sourceforge.net/

  6. http://www.oracle.com/technetwork/java/codeconv-138413.html

  7. For projects where we did not provide a version, we used version control (accessed on 31/05/2013).

  8. http://ser.soccerlab.polymtl.ca/ser-repos/public/tr-data/lapd-rep-pckg.zip

  9. http://argouml.tigris.org

  10. http://cocoon.apache.org

  11. http://www.eclipse.org

  12. http://www.sensopia.com/english/index.html

  13. None of the questionnaires containing examples of type C.2 was answered.

  14. We do not report project names with the examples to avoid disclosing the confidentiality of the provided answers.

  15. A change may be one or more of the following: modification, addition, or removal.

References

  • Abbes M, Khomh F, Guéhéneuc YG, Antoniol G (2011) An empirical study of the impact of two antipatterns, Blob and Spaghetti Code, on program comprehension. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), pp 181–190

  • Abebe S, Tonella P (2011) Towards the extraction of domain concepts from the identifiers. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 77–86

  • Abebe S, Tonella P (2013) Automated identifier completion and replacement. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), pp 263–272

  • Abebe SL, Haiduc S, Tonella P, Marcus A (2011) The effect of lexicon bad smells on concept location in source code. In: Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM), pp 125–134

  • Abebe SL, Arnaoudova V, Tonella P, Antoniol G, Guéhéneuc YG (2012) Can lexicon bad smells improve fault prediction? In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 235–244

  • Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of the International Conference of the Centre for Advanced Studies on Collaborative Research (CASCON), pp 213–222

  • Arnaoudova V, Di Penta M, Antoniol G, Guéhéneuc YG (2013) A new family of software anti-patterns: Linguistic anti-patterns. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), pp 187–196

  • Arnaoudova V, Eshkevari L, Di Penta M, Oliveto R, Antoniol G, Guéhéneuc YG (2014) Repent: Analyzing the nature of identifier renamings. IEEE Trans Softw Eng (TSE) 40(5):502–532

    Article  Google Scholar 

  • Brooks R (1983) Towards a theory of the comprehension of computer programs. In J Man-Machine Stud 18(6):543–554

    Article  MathSciNet  Google Scholar 

  • Brown WJ, Malveau RC, Brown WH, McCormick III HW, Mowbray TJ (1998a) Anti patterns: refactoring software, architectures, and projects in crisis, 1st edn. Wiley, New York

    Google Scholar 

  • Brown WJ, Malveau RC, HWM III, Mowbray TJ (1998b) AntiPatterns: refactoring software, architectures, and projects in crisis. Wiley, New York

    Google Scholar 

  • Caprile B, Tonella P (1999) Nomen est omen: Analyzing the language of function identifiers. In: Proceedings of Working Conference on Reverse Engineering (WCRE), pp 112–122

  • Caprile B, Tonella P (2000) Restructuring program identifier names. In: Proceedings of the International Conference on Software Maintenance (ICSM), pp 97–107

  • Chaudhary BD, Sahasrabuddhe HV (1980) Meaningfulness as a factor of program complexity. In: Proceedings of the ACM Annual Conference, ACM, ACM ’80, pp 457–466

  • De Lucia A, Di Penta M, Oliveto R (2011) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37(2):205–227

    Article  Google Scholar 

  • Deissenbock F, Pizka M (2005) Concise and consistent naming. In: Proceedings of the International Workshop on Program Comprehension (IWPC), pp 97–106

  • Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley, MA

    Google Scholar 

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object oriented software. Addison-Wesley, Boston

    Google Scholar 

  • Glaser BG (1992) Basics of grounded theory analysis. Sociology Press

  • Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates

  • Groves RM, Fowler Jr FJ, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2009) Survey methodology, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Hintze JL, Nelson RD (1998) Violin plots: a box plot-density trace synergism. Am Stat 52(2):181–184

    Google Scholar 

  • Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: International symposium on empirical software engineering

  • Khomh F, Di Penta M, Guéhéneuc YG (2009) An exploratory study of the impact of code smells on software change-proneness. In: Proceedings of the working conference on reverse engineering (WCRE), pp 75–84

  • Khomh F, Di Penta M, Guéhéneuc YG, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empir Softw Eng 17(3):243–275

    Article  Google Scholar 

  • Kitchenham B, Pfleeger S, Pickard L, Jones P, Hoaglin D, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng (TSE) 28(8):721–734

    Article  Google Scholar 

  • Lawrie D, Morrell C, Feild H, Binkley D (2006) What’s in a name? a study of identifiers. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 3–12

  • Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innovations Syst Softw Eng 3(4):303–318

    Article  Google Scholar 

  • Merlo E, McAdam I, De Mori R (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint 15(4):205–244

    Article  Google Scholar 

  • Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Moha N, Guéhéneuc YG, Duchien L, Le Meur AF (2010) DECOR: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng (TSE’10) 36(1):20–36

    Article  Google Scholar 

  • Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research. In: Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 466–476

  • Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter, London

    Google Scholar 

  • Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: Proceedings of the international conference on automated software engineering (ASE), pp 268–278

  • Palomba F, Bavota G, Penta M D, Oliveto R, Lucia A D (2014) Do they really smell bad? A study on developers’ perception of code bad smells. In: International conference on software maintenance and evolution (ICSME), p. to appear

  • Parsons J, Saunders C (2004) Cognitive heuristics in software engineering: applying and extending anchoring and adjustment to artifact reuse. IEEE Trans Softw Eng (TSE) 30(12):873–888

    Article  Google Scholar 

  • Prechelt L, Unger-Lamprecht B, Philippsen M, Tichy W (2002) Two controlled experiments assessing the usefulness of design pattern documentation in program maintenance. IEEE Trans Softw Eng (TSE) 28(6):595–606

    Article  Google Scholar 

  • Raţiu D, Ducasse S, Girba T, Marinescu R (2004) Using history information to improve design flaws detection. In: Proceedings of the European conference on software maintenance and reengineering (CSMR), pp 223–232

  • Sheil BA (1981) The psychological study of programming. ACM Comput Surv (CSUR) 13(1):101–120

    Article  Google Scholar 

  • Shneiderman B (1977) Measuring computer program quality and comprehension. Int J Man-Machine Stud 9(4):465–478

    Article  Google Scholar 

  • Shneiderman B, Mayer R (1975) Towards a cognitive model of progammer behavior, Tech Rep, vol 37. Indiana University, Bloomington

    Google Scholar 

  • Shull F, Singer J, Sjøberg DI (eds) (2007) Guide to advanced empirical software engineering. Springer, New York

    Google Scholar 

  • Strauss AL (1987) Qualitative analysis for social scientists. Cambridge Univsersity Press

  • Takang A, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experiential study. J Program Lang 4(3):143–167

    Google Scholar 

  • Tan L, Yuan D, Krishna G, Zhou Y (2007) /*iComment: bugs or bad comments?*/, Proceedings of the ACM SIGOPS Symposium on Operating Systems Principles (SOSP) 41(6):145–158

  • Tan L, Zhou Y, Padioleau Y (2011) Acomment: mining annotations from comments and code to detect interrupt related concurrency bugs. In: Proceedings of the International Conference on Software Engineering (ICSE)

  • Tan SH, Marinov D, Tan L, Leavens GT (2012) @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In: Proceedings of the international conference on software testing, verification and validation (ICST), pp 260–269

  • Torchiano M (2002) Documenting pattern use in java programs. In: Proceedings of the international conference on software maintenance (ICSM), pp 230–233

  • Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT conference on empirical methods in natural language processing and very large corpora (EMNLP/VLC-2000), association for computational linguistics, pp 63–70

  • Weissman L (1974a) Psychological complexity of computer programs: an experimental methodology. SIGPLAN Not 9(6):25–36

    Article  Google Scholar 

  • Weissman LM (1974b) A methodology for studying the psychological complexity of computer programs. PhD thesis

  • Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering - an introduction. Kluwer, Boston

    Book  MATH  Google Scholar 

  • Woodfield SN, Dunsmore HE, Shen VY (1981) The effect of modularization and comments on program comprehension. In: Proceedings of the international conference on software engineering (ICSE), pp 215–223

  • Yamashita A, Moonen L (2013) Do developers care about code smells? - An exploratory survey. In: Proceedings of the working conference on reverse engineering (WCRE), pp 242–251

  • Zhong H, Zhang L, Xie T, Mei H (2011) Inferring specifications for resources from natural language api documentation. Autom Softw Eng 18(3–4):227–261

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the participants to the two studies for their precious time and effort. They made this work possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Venera Arnaoudova.

Additional information

Communicated by: Nachiappan Nagappan

This work is an extension of our previous paper (Arnaoudova et al. 2013)

Appendix

Appendix

1.1 A Detection

A.1 - “Get” more than accessor::

Find accessor methods by identifying methods whose name starts with ‘get’ and ends with a substring that corresponds to an attribute in the same class and where the attribute’s declared type and the accessor’s return type are the same. Then, identify those accessors that are performing more actions than returning the corresponding attribute. Cases where the attribute is set before it is returned (i.e., Proxy and Singleton design patterns) should not be considered as part of this LA. For a detection built on top of an Abstract Syntax Tree (AST) expressions other than a return statement—where the attribute is returned—can be allowed only if they are child of a conditional check for null value. Other measures for complexity, such as LOC or McCabe’s Cyclomatic Complexity, can be used for a simpler but less accurate detection.

A.2 - “Is” returns more than a Boolean::

Find methods starting with “is” whose return type is not Boolean.

A.3 - “Set” method returns::

Find modifier methods (or more generally methods whose name starts with “set”) and whose return type is different from void.

A.4 - Expecting but not getting single instance::

Find methods returning a collection (e.g., array, list, vector, etc.) but whose name ends with a singular noun and does not contain a word implying a collection (eg., array, list, vector, etc.).

B.1 - Not implemented condition::

Find methods with at least one conditional sentence in comments but with no conditional statements in the implementation (e.g., no control structures or ternary operators).

B.2 - Validation method does not confirm::

Find validation methods (e.g., method names starting with “validate”, “check”, “ensure”) whose return type is void and that do not throw an exception.

B.3 - “Get” method does not return::

Find methods where the name suggests a return value (e.g., names starting with “get”, “return”) but where the return type is void.

B.4 - Not answered question::

Find methods whose name is in the form of predicate (e.g., starts with “is”, “has”) and whose return type is void.

B.5 - Transform method does not return::

Find methods whose name suggests a transformation of an object, (e.g., toSomething, source2target) but its return type is void.

B.6 - Expecting but not getting a collection::

The method name suggests that it returns (e.g., starts with “get”, “return”) multiple objects (e.g., ends with a plural noun), however the return type is not a collection.

C.1 - Method name and return type are opposite::

Find methods where the name and return type contain antonyms.

C.2 - Method signature and comment are opposite::

Find methods whose name or return type have an antonym relation with its comment.

D.1 - Says one but contains many::

Find attributes having a name ending with a singular noun and having a collection as declaring type.

D.2 - Name suggests Boolean but type does not::

Find attributes whose name is structured as a predicate, i.e., starting with a verb in third person (e.g., “is”, “has”) or ending with a verb in gerund/present participle, but whose declaring type is not Boolean.

E.1 - Says many but contains one::

Find attributes having a name ending with a plural noun, however their type is not a collection neither it contains a plural noun.

F.1 - Attribute name and type are opposite::

Find attributes whose name and declaring type contain antonyms.

F.2 - Attribute signature and comment are opposite::

Find attributes whose name or declaring type have an antonym relation with its comment.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arnaoudova, V., Di Penta, M. & Antoniol, G. Linguistic antipatterns: what they are and how developers perceive them. Empir Software Eng 21, 104–158 (2016). https://doi.org/10.1007/s10664-014-9350-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9350-8

Keywords

Navigation