skip to main content
10.1145/2972958.2972963acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

Measuring the Stylistic Inconsistency in Software Projects using Hierarchical Agglomerative Clustering

Published: 09 September 2016 Publication History

Abstract

Background: Although many software engineering methodologies and guidelines are provided, it is common that developers apply their very own programming styles to the source code being produced. These individually preferred programming styles are more comprehensive for themselves, but may well conflict with each other. Thus, the problem of stylistic inconsistency is inevitable during the software development process involving multiple developers, the result is undesirable and that will significantly degrade program readability and maintainability. Aims: Given limited understanding in this regard, we perform an empirical analysis for the purpose of quantitatively measuring the inconsistency degree of programming style within a software project team. Method: We first propose stylistic fingerprints, which are represented as a set of attribute-counting-metrics, in an attempt to characterize different programming styles. Then we adopt the hierarchical agglomerative clustering (HAC) technique to quantitatively measuring the proximity of programming style based on six C/C++ open source projects chosen from different application domains. Results: The empirical results demonstrate the feasibility and validity of our fingerprinting methodology. Moreover, the proposed clustering procedure utilizing HAC algorithm with dendrograms is capable of effectively illustrating the inconsistency degree of programming style among source files, which is significant for future research. Conclusions: This study proposed an effective and efficient approach for analyzing programming style inconsistency, supported by a sound theoretical basis for dealing with such a problem. Ultimately improving program readability and therefore reduce the maintenance overhead for software projects.

References

[1]
Akhlaq, U. Impact of Software Comprehension in Software Maintenance and Evolution. PhD thesis, Blekinge Institute of Technology, 2010.
[2]
Allamanis, M., Barr, E. T., Bird, C., and Sutton, C. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014 (New York, New York, USA, 2014), ACM Press, pp. 281--293.
[3]
Arabyarmohamady, S., Moradi, H., and Asadpour, M. A coding style-based plagiarism detection. In Proceedings of 2012 International Conference on Interactive Mobile and Computer Aided Learning (IMCL) (nov 2012), no. Imcl, IEEE, pp. 180--186.
[4]
Arai, M. Development and evaluation of Eclipse plugin tool for learning programming style of Java. In 2014 9th International Conference on Computer Science & Education (aug 2014), vol. 30, IEEE, pp. 495--499.
[5]
Avgustinov, P., Baars, A. I., Henriksen, A. S., Lavender, G., Menzel, G., Moor, O. D., Schafer, M., and Tibble, J. Tracking Static Analysis Violations over Time to Capture Developer Characteristics. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (may 2015), IEEE, pp. 437--447.
[6]
Berry, R. E., and A. E. Meekings, B. A style analysis of C programs. Communications of the ACM 28, 1 (jan 1985), 80--88.
[7]
Binkley, D., Davis, M., Lawrie, D., Maletic, J. I., Morrell, C., and Sharif, B. The impact of identifier style on effort and comprehension. Empirical Software Engineering 18, 2 (apr 2013), 219--276.
[8]
Corbo, F., del Grosso, C., and di Penta, M. Smart Formatter: Learning Coding Style from Existing Source Code. In 2007 IEEE International Conference on Software Maintenance (oct 2007), no. i, IEEE, pp. 525--526.
[9]
Deimel, L. E. The uses of program reading. ACM SIGCSE Bulletin 17, 2 (jun 1985), 5--14.
[10]
Ding, H., and Samadzadeh, M. H. Extraction of Java program fingerprints for software authorship identification. Journal of Systems and Software 72, 1 (jun 2004), 49--57.
[11]
Hopkins, B., and Skellam, J. A new method for determining the type of distribution of plant individuals. Annals of Botany 18, 2 (1954), 213--227.
[12]
Krsul, I., and Spafford, E. H. Authorship analysis: identifying the author of a program. Computers & Security 16, 3 (jan 1997), 233--257.
[13]
Mäkelä, S., and Leppänen, V. Japroch: A tool for checking programming style. Kolin Kolistelut-Koli Calling 2004 (2004), 151.
[14]
Miara, R. J., Musselman, J. a., Navarro, J. a., and Shneiderman, B. Program indentation and comprehensibility. Communications of the ACM 26, 11 (nov 1983), 861--867.
[15]
Nguyen, V., Deeds-Rubin, S., Tan, T., and Boehm, B. A sloc counting standard. In COCOMO II Forum (2007), vol. 2007.
[16]
Oman, P. W., and Cook, C. R. A paradigm for programming style research. ACM SIGPLAN Notices 23, 12 (dec 1988), 69--78.
[17]
Oman, P. W., and Cook, C. R. Programming style authorship analysis. In Proceedings of the seventeenth annual ACM conference on Computer science: Computing trends in the 1990's Computing trends in the 1990's - CSC '89 (New York, New York, USA, 1989), ACM Press, pp. 320--326.
[18]
Oman, P. W., and Cook, C. R. A taxonomy for programming style. In Proceedings of the 1990 ACM annual conference on Cooperation - CSC '90 (New York, New York, USA, 1990), ACM Press, pp. 244--250.
[19]
Raymond, D. R. Reading source code. Proc. CASCON (1991), 3--16.
[20]
Rees, M. J. Automatic assessment aids for pascal programs. SIGPLAN Not. 17, 10 (Oct. 1982), 33--42.
[21]
Sommerville, I. Software Engineering. International Computer Science Series. Pearson, 2011.
[22]
Tibshirani, R., Walther, G., and Hastie, T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 2 (may 2001), 411--423.
[23]
Ward, J. H. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association 58, 301 (mar 1963), 236.
[24]
Woodfield, S. N., Dunsmore, H. E., and Shen, V. Y. The effect of modularization and comments on program comprehension. In Proceedings of the 5th International Conference on Software Engineering (Piscataway, NJ, USA, 1981), ICSE '81, IEEE Press, pp. 215--223.

Cited By

View all
  • (2022)Synergies Between Artificial Intelligence and Software Engineering: Evolution and TrendsHandbook on Artificial Intelligence-Empowered Applied Software Engineering10.1007/978-3-031-08202-3_2(11-36)Online publication date: 4-Sep-2022
  • (2017)Using Eye Tracking Technology to Analyze the Impact of Stylistic Inconsistency on Code Readability2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2017.102(579-580)Online publication date: Jul-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE 2016: Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering
September 2016
84 pages
ISBN:9781450347723
DOI:10.1145/2972958
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. empirical software engineering
  2. hierarchical agglomerative clustering
  3. programming style
  4. stylistic inconsistency

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PROMISE 2016

Acceptance Rates

PROMISE 2016 Paper Acceptance Rate 10 of 23 submissions, 43%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Synergies Between Artificial Intelligence and Software Engineering: Evolution and TrendsHandbook on Artificial Intelligence-Empowered Applied Software Engineering10.1007/978-3-031-08202-3_2(11-36)Online publication date: 4-Sep-2022
  • (2017)Using Eye Tracking Technology to Analyze the Impact of Stylistic Inconsistency on Code Readability2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2017.102(579-580)Online publication date: Jul-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media