Skip to main content
Log in

How does code style inconsistency affect pull request integration? An exploratory study on 117 GitHub projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

GitHub is a popular code platform that provides infrastructures to facilitate collaborative development. A Pull Request (PR) is one of the key ideas to support collaboration. Developers are encouraged to submit PRs to ask for the integration of their contributions. In practice, not all submitted PRs can be integrated into the codebase by project maintainers. Existing studies have investigated factors affecting PR integration. Nevertheless, the code style of PRs, which is largely considered by project maintainers, has not been deeply studied yet. In this paper, we performed an exploratory analysis on the effect of code style on PR integration in GitHub. We modeled the code style via the inconsistency between a submitted PR and the existing code in its target codebase. Such modeling makes our study not limited by a specific definition of code style. We conducted our experiments on 50,092 closed PRs in 117 Java projects. Our findings show that: (1) There indeed exists code style inconsistency between PRs and the codebase. (2) Several code style criteria on how to use spaces or indents, make comments, and write code lines with a suitable length, tend to show more inconsistency among PRs. (3) A PR that is consistent with the current code style tends to be merged into the codebase more easily. (4) A PR that violates the current code style is likely to take more time to get closed. Our study shows evidence to developers about how to deliver better contributions to facilitate efficient collaboration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://github.com

  2. http://github.com/features

  3. In GitHub, a “repository” denotes a project in general. In this paper, we use “repository” and “project” interchangeably.

  4. http://github.com/rails/rails

  5. https://github.com/AndlyticsProject/andlytics

  6. http://github.com/AndlyticsProject/andlytics#contributing

  7. http://www.gnu.org/prep/standards_toc.html

  8. http://github.com/google/styleguide

  9. It is common for a project to have a readme file or a contribution file. The readme file broadly describes the project; while the contribution file mainly introduces the tips to contribute to this project.

  10. http://guides.github.com/activities/contributing-to-open-source/#contributing

  11. http://github.com/mongodb/mongo

  12. http://github.com/rubinius/rubinius/pull/3512

  13. http://github.com/querydsl/querydsl

  14. In this study, some motivation examples (i.e., GNU, Goolge, and GitHub) mainly came from manual search of code style related documentation in well-known open source communities or company originated open source projects; while other examples (i.e., mongodb/mongo, rubinius/rubinius, and querydsl/querydsl) were collected by manually checking the documents and commit logs of some randomly selected popular projects on GitHub.

  15. http://githut.info

  16. https://www.githubarchive.org/

  17. https://help.github.com/articles/filtering-issues-and-pull-requests-by-assignees/

  18. http://github.com/filipg/amu_automata_2011

  19. http://checkstyle.sourceforge.net/

  20. Checkstyle is a highly configurable tool of checking code style. The code style by Google and Oracle are supported by the tool. In our experiment, we configured Checkstyle to check whether a piece of code violates 37 code style criteria.

  21. http://github.com/magefree/mage

  22. http://github.com/CUTR-at-USF/OpenTripPlanner-for-Android

  23. http://sourceforge.net

  24. http://bitbucket.org/

  25. http://findbugs.sourceforge.net

References

  • Allamanis M, Barr ET, Bird C, Sutton CA (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 281–293

  • Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 35th International Conference on Software Engineering, pp 712–721

  • Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 35th International Conference on Software Engineering, pp 931–940

  • Bartoń K (2013) Mumin: Multi-model inference. r package version 1.9. 13 The Comprehensive R Archive Network (CRAN), Vienna, Austria

  • Bates DM (2010) lme4: Mixed-effects modeling with r

  • Berry RE, Meekings BAE (1985) A style analysis of C programs. Commun ACM 28(1):80–88

    Article  Google Scholar 

  • Boogerd C, Moonen L (2009) Evaluating the relation between coding standard violations and faultswithin and across software versions. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, pp 41–50

  • Bridger A, Pisano J (2001) C++ coding standards

  • Butler S, Wermelinger M, Yu Y, Sharp H (2009) Relating identifier naming flaws and code quality: An empirical study. In: Proceedings of the 16th Working Conference on Reverse Engineering, pp 31–35

  • Cohen J (1977) Statistical power analysis for the behavioral sciences (revised ed.)

    Chapter  Google Scholar 

  • Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Evanston

    Book  Google Scholar 

  • Cohen-Goldberg A M (2012) Phonological competition within the word: Evidence from the phoneme similarity effect in spoken production. J Mem Lang 67(1):184–198

    Article  Google Scholar 

  • de Lima Júnior ML, Soares DM, Plastino A, Murta L (2015) Developers assignment for analyzing pull requests. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1567–1572

  • Google (2013) Google Java code style. http://google.github.io/styleguide/javaguide.html

  • Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp 345–355

  • Gousios G, Zaidman A, Storey MD, van Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, pp 358–368

  • Gousios G, Storey MD, Bacchelli A (2016) Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering, pp 285–296

  • Graham M H (2003) Confronting multicollinearity in ecological multiple regression. Ecol 84(11):2809–2815

    Article  Google Scholar 

  • Hauke J, Kossowski T (2011) Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest Geogr 30(2):87–93

    Article  Google Scholar 

  • Hellendoorn V, Devanbu PT, Bacchelli A (2015) Will they like this? evaluating code contributions with language models. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 157–167

  • Jaeger FT (2011) Fitting, Evaluating, and Reporting Mixed Models

  • Jiarpakdee J, Tantithamthavorn C, Treude C (2018) Autospearman: Automatically mitigating correlated software metrics for interpreting defect models. In: Proceedings of the 34th International Conference on Software Maintenance and Evolution, pp 92–103

  • Johnson P C (2014) Extension of nakagawa & schielzeth’s r2glmm to random slopes models. Methods Ecol Evol 5(9):944–946

    Article  Google Scholar 

  • Kabacoff R (2015) R in action: data analysis and graphics with R. Manning Publications Co.

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, Germán DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 92–101

  • Kalliamvakou E, Damian DE, Blincoe K, Singer L, Germán DM (2015) Open source-style collaborative development practices in commercial projects using github. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, pp 574–585

  • Lemhöfer K, Dijkstra T, Schriefers H, Baayen R H, Grainger J, Zwitserlood P (2008) Native language influences on word recognition in a second language: A megastudy. J Exper Psychol Learn Memory Cogn 34(1):12

    Article  Google Scholar 

  • Mäntylä MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448

    Article  Google Scholar 

  • Marca D (1981) Some pascal style guidelines. ACM Sigplan Not 16(4):70–80

    Article  Google Scholar 

  • McConnell S (1993) Code complete: a practical handbook of software construction. Microsoft Press

  • Miara R J, Musselman J A, Navarro J A, Shneiderman B (1983) Program indentation and comprehensibility. Commun ACM 26(11):861–867

    Article  Google Scholar 

  • Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods Ecol Evol 4(2):133–142

    Article  Google Scholar 

  • Oman PW, Cook CR (1990) A taxonomy for programming style. In: Proceedings of the ACM 18th Annual Computer Science Conference on Cooperation, pp 244–250

  • Oracle (1999) Oracle java code style. http://www.oracle.com/technetwork/java/codeconvtoc-136057.html

  • Padhye R, Mani S, Sinha VS (2014) A study of external community contribution to open-source projects on github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 332–335

  • Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 364–367

  • Rees MJ (1982) Automatic assessment aids for pascal programs. SIGPLAN Not 17(10):33–42

    Article  Google Scholar 

  • Rigby PC, Storey MD (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering, pp 541–550

  • Sadowski C, Aftandilian E, Eagle A, Miller-Cushon L, Jaspan C (2018) Lessons from building static analysis tools at google. Commun ACM 61(4):58–66

    Article  Google Scholar 

  • Selya AS, Rose JS, Dierker LC, Hedeker D, Mermelstein RJ (2012) A practical guide to calculating cohen’s f2, a measure of local effect size, from proc mixed. Front Psychol 3:111

    Article  Google Scholar 

  • Smit M, Gergel B, Hoover HJ, Stroulia E (2011) Code convention adherence in evolving software. In: Proceedings of the IEEE 27th International Conference on Software Maintenance, pp 504–507

  • Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1541–1546

  • Tsay J, Dabbish L, Herbsleb JD (2014a) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, pp 356–366

  • Tsay J, Dabbish L, Herbsleb JD (2014b) Let’s talk about it: evaluating contributions through discussion in github. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 144–154

  • Vasilescu B, Yu Y, Wang H, Devanbu PT, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pp 805–816

  • van der Veen E, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 357–361

  • Vermeulen A (2000) The Elements of Java (TM) Style. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Yu Y, Wang H, Yin G, Ling CX (2014) Reviewer recommender of pull-requests in github. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, pp 609–612

  • Yu Y, Wang H, Filkov V, Devanbu PT, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 367–371

  • Zhang Y, Yin G, Yu Y, Wang H (2014) Investigating social media in github’s pull-requests: a case study on ruby on rails. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, pp 37–41

  • Zhang X, Chen Y, Gu Y, Zou W, Xie X, Jia X, Xuan J (2018) How do multiple pull requests change the same code: A study of competing pull requests in github. In: Proceedings of the 34th IEEE International Conference on Software Maintenance and Evolution, pp 228–239

Download references

Acknowledgments

The authors would like to greatly thank our lab members, Yufeng Zhao, Yiming Chen, and Mengting Zhou, for crawling GitHub project data for experiments. This work is partly supported by the National Natural Science Foundation of China (Grant No.61690201, 61772014, 61802171, 61872273, 61572375), and the China Scholarship Council Scholarship. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenyu Chen.

Additional information

Communicated by: Ahmed E. Hassan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, W., Xuan, J., Xie, X. et al. How does code style inconsistency affect pull request integration? An exploratory study on 117 GitHub projects. Empir Software Eng 24, 3871–3903 (2019). https://doi.org/10.1007/s10664-019-09720-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09720-x

Keywords

Navigation