Abstract
GitHub is a popular code platform that provides infrastructures to facilitate collaborative development. A Pull Request (PR) is one of the key ideas to support collaboration. Developers are encouraged to submit PRs to ask for the integration of their contributions. In practice, not all submitted PRs can be integrated into the codebase by project maintainers. Existing studies have investigated factors affecting PR integration. Nevertheless, the code style of PRs, which is largely considered by project maintainers, has not been deeply studied yet. In this paper, we performed an exploratory analysis on the effect of code style on PR integration in GitHub. We modeled the code style via the inconsistency between a submitted PR and the existing code in its target codebase. Such modeling makes our study not limited by a specific definition of code style. We conducted our experiments on 50,092 closed PRs in 117 Java projects. Our findings show that: (1) There indeed exists code style inconsistency between PRs and the codebase. (2) Several code style criteria on how to use spaces or indents, make comments, and write code lines with a suitable length, tend to show more inconsistency among PRs. (3) A PR that is consistent with the current code style tends to be merged into the codebase more easily. (4) A PR that violates the current code style is likely to take more time to get closed. Our study shows evidence to developers about how to deliver better contributions to facilitate efficient collaboration.




Similar content being viewed by others
Notes
In GitHub, a “repository” denotes a project in general. In this paper, we use “repository” and “project” interchangeably.
It is common for a project to have a readme file or a contribution file. The readme file broadly describes the project; while the contribution file mainly introduces the tips to contribute to this project.
In this study, some motivation examples (i.e., GNU, Goolge, and GitHub) mainly came from manual search of code style related documentation in well-known open source communities or company originated open source projects; while other examples (i.e., mongodb/mongo, rubinius/rubinius, and querydsl/querydsl) were collected by manually checking the documents and commit logs of some randomly selected popular projects on GitHub.
Checkstyle is a highly configurable tool of checking code style. The code style by Google and Oracle are supported by the tool. In our experiment, we configured Checkstyle to check whether a piece of code violates 37 code style criteria.
References
Allamanis M, Barr ET, Bird C, Sutton CA (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 281–293
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 35th International Conference on Software Engineering, pp 712–721
Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 35th International Conference on Software Engineering, pp 931–940
Bartoń K (2013) Mumin: Multi-model inference. r package version 1.9. 13 The Comprehensive R Archive Network (CRAN), Vienna, Austria
Bates DM (2010) lme4: Mixed-effects modeling with r
Berry RE, Meekings BAE (1985) A style analysis of C programs. Commun ACM 28(1):80–88
Boogerd C, Moonen L (2009) Evaluating the relation between coding standard violations and faultswithin and across software versions. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, pp 41–50
Bridger A, Pisano J (2001) C++ coding standards
Butler S, Wermelinger M, Yu Y, Sharp H (2009) Relating identifier naming flaws and code quality: An empirical study. In: Proceedings of the 16th Working Conference on Reverse Engineering, pp 31–35
Cohen J (1977) Statistical power analysis for the behavioral sciences (revised ed.)
Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Evanston
Cohen-Goldberg A M (2012) Phonological competition within the word: Evidence from the phoneme similarity effect in spoken production. J Mem Lang 67(1):184–198
de Lima Júnior ML, Soares DM, Plastino A, Murta L (2015) Developers assignment for analyzing pull requests. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1567–1572
Google (2013) Google Java code style. http://google.github.io/styleguide/javaguide.html
Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp 345–355
Gousios G, Zaidman A, Storey MD, van Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, pp 358–368
Gousios G, Storey MD, Bacchelli A (2016) Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering, pp 285–296
Graham M H (2003) Confronting multicollinearity in ecological multiple regression. Ecol 84(11):2809–2815
Hauke J, Kossowski T (2011) Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest Geogr 30(2):87–93
Hellendoorn V, Devanbu PT, Bacchelli A (2015) Will they like this? evaluating code contributions with language models. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 157–167
Jaeger FT (2011) Fitting, Evaluating, and Reporting Mixed Models
Jiarpakdee J, Tantithamthavorn C, Treude C (2018) Autospearman: Automatically mitigating correlated software metrics for interpreting defect models. In: Proceedings of the 34th International Conference on Software Maintenance and Evolution, pp 92–103
Johnson P C (2014) Extension of nakagawa & schielzeth’s r2glmm to random slopes models. Methods Ecol Evol 5(9):944–946
Kabacoff R (2015) R in action: data analysis and graphics with R. Manning Publications Co.
Kalliamvakou E, Gousios G, Blincoe K, Singer L, Germán DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 92–101
Kalliamvakou E, Damian DE, Blincoe K, Singer L, Germán DM (2015) Open source-style collaborative development practices in commercial projects using github. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, pp 574–585
Lemhöfer K, Dijkstra T, Schriefers H, Baayen R H, Grainger J, Zwitserlood P (2008) Native language influences on word recognition in a second language: A megastudy. J Exper Psychol Learn Memory Cogn 34(1):12
Mäntylä MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448
Marca D (1981) Some pascal style guidelines. ACM Sigplan Not 16(4):70–80
McConnell S (1993) Code complete: a practical handbook of software construction. Microsoft Press
Miara R J, Musselman J A, Navarro J A, Shneiderman B (1983) Program indentation and comprehensibility. Commun ACM 26(11):861–867
Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods Ecol Evol 4(2):133–142
Oman PW, Cook CR (1990) A taxonomy for programming style. In: Proceedings of the ACM 18th Annual Computer Science Conference on Cooperation, pp 244–250
Oracle (1999) Oracle java code style. http://www.oracle.com/technetwork/java/codeconvtoc-136057.html
Padhye R, Mani S, Sinha VS (2014) A study of external community contribution to open-source projects on github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 332–335
Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 364–367
Rees MJ (1982) Automatic assessment aids for pascal programs. SIGPLAN Not 17(10):33–42
Rigby PC, Storey MD (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering, pp 541–550
Sadowski C, Aftandilian E, Eagle A, Miller-Cushon L, Jaspan C (2018) Lessons from building static analysis tools at google. Commun ACM 61(4):58–66
Selya AS, Rose JS, Dierker LC, Hedeker D, Mermelstein RJ (2012) A practical guide to calculating cohen’s f2, a measure of local effect size, from proc mixed. Front Psychol 3:111
Smit M, Gergel B, Hoover HJ, Stroulia E (2011) Code convention adherence in evolving software. In: Proceedings of the IEEE 27th International Conference on Software Maintenance, pp 504–507
Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1541–1546
Tsay J, Dabbish L, Herbsleb JD (2014a) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, pp 356–366
Tsay J, Dabbish L, Herbsleb JD (2014b) Let’s talk about it: evaluating contributions through discussion in github. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 144–154
Vasilescu B, Yu Y, Wang H, Devanbu PT, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pp 805–816
van der Veen E, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 357–361
Vermeulen A (2000) The Elements of Java (TM) Style. Cambridge University Press, Cambridge
Yu Y, Wang H, Yin G, Ling CX (2014) Reviewer recommender of pull-requests in github. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, pp 609–612
Yu Y, Wang H, Filkov V, Devanbu PT, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 367–371
Zhang Y, Yin G, Yu Y, Wang H (2014) Investigating social media in github’s pull-requests: a case study on ruby on rails. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, pp 37–41
Zhang X, Chen Y, Gu Y, Zou W, Xie X, Jia X, Xuan J (2018) How do multiple pull requests change the same code: A study of competing pull requests in github. In: Proceedings of the 34th IEEE International Conference on Software Maintenance and Evolution, pp 228–239
Acknowledgments
The authors would like to greatly thank our lab members, Yufeng Zhao, Yiming Chen, and Mengting Zhou, for crawling GitHub project data for experiments. This work is partly supported by the National Natural Science Foundation of China (Grant No.61690201, 61772014, 61802171, 61872273, 61572375), and the China Scholarship Council Scholarship. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Ahmed E. Hassan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, W., Xuan, J., Xie, X. et al. How does code style inconsistency affect pull request integration? An exploratory study on 117 GitHub projects. Empir Software Eng 24, 3871–3903 (2019). https://doi.org/10.1007/s10664-019-09720-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09720-x