Skip to main content
Log in

Improving the pull requests review process using learning-to-rank algorithms

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Collaborative software development platforms (such as GitHub and GitLab) have become increasingly popular as they have attracted thousands of external contributors to contribute to open source projects. The external contributors may submit their contributions via pull requests, which must be reviewed before being integrated into the central repository. During the review process, reviewers provide feedback to contributors, conduct tests and request further modifications before finally accepting or rejecting the contributions. The role of reviewers is key to maintain the effective review process of the project. However, the number of decisions that reviewers can make is far superseded by the increasing number of pull requests submissions. To help reviewers to perform more decisions on pull requests within their limited working time, we propose a learning-to-rank (LtR) approach to recommend pull requests that can be quickly reviewed by reviewers. Different from a binary model for predicting the decisions of pull requests, our ranking approach complements the existing list of pull requests based on their likelihood of being quickly merged or rejected. We use 18 metrics to build LtR models and we use six different LtR algorithms, such as ListNet, RankNet, MART and random forest. We conduct empirical studies on 74 Java projects to compare the performances of the six LtR algorithms. We compare the best performing algorithm against two baselines obtained from previous research regarding pull requests prioritization: the first-in-and-first-out (FIFO) baseline and the small-size-first baseline. We then conduct a survey with GitHub reviewers to understand the perception of code reviewers regarding the usefulness of our approach. We observe that: (1) The random forest LtR algorithm outperforms other five well adapted LtR algorithms to rank quickly merged pull requests. (2) The random forest LtR algorithm performs better than both the FIFO and the small-size-first baselines, which means our LtR approach can help reviewers make more decisions and improve their productivity. (3) The contributor’s social connections and contributor’s experience are the most influential metrics to rank pull requests that can be quickly merged. (4) The GitHub reviewers that participated in our survey acknowledge that our approach complements existing prioritization baselines to help them to prioritize and to review more pull requests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://ghtorrent.org/relational.html

  2. https://sourceforge.net/p/lemur/wiki/RankLib/

  3. https://cran.r-project.org/web/packages/beanplot/vignettes/beanplot.eps

References

  • Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 931–940

  • Bird C, Gourley A, Devanbu P (2007) Detecting patch submission and acceptance in oss projects. In: Proceedings of the 4th international workshop on mining software repositories. IEEE Computer Society, p 26

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 89–96

  • Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on Machine learning. ACM, pp 129–136

  • Colaco M, Svider PF, Agarwal N, Eloy JA, Jackson IM (2013) Readability assessment of online urology patient education materials. J Urology 189(3):1048–1052

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of Statistics 1189–1232

  • Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4(Nov):933–969

    MathSciNet  MATH  Google Scholar 

  • Gousios G (2013) The ghtorent dataset and tool suite. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 233–236

  • Gousios G, Storey M-A (2016) A Bacchelli, Work practices and challenges in pull-based development: The contributor’s perspective. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 285–296

  • Gousios G, Pinzger M, Deursen Av (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. ACM, pp 345– 355

  • Gousios G, Zaidman A, Storey M-A, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 358–368

  • Herbrich R (2000) Large margin rank boundaries for ordinal regression, Advances in large margin classifiers, 115–132

  • Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression

  • Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 99–108

  • Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering. ACM, pp 34–43

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 92–101

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

    Article  MATH  Google Scholar 

  • Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49(4):764–766

    Article  Google Scholar 

  • Li H (2011) A short introduction to learning to rank. IEICE Trans Inf Syst 94 (10):1854–1862

    Article  Google Scholar 

  • Li H (2014) Learning to rank for information retrieval and natural language processing. Synth Lect Human Lang Technol 7(3):1–121

    Article  Google Scholar 

  • Li Z, Yin G, Yu Y, Wang T, Wang H (2017) Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific symposium on internetware. ACM, p 20

  • McCallum DR, Peterson JL (1982) Computer-based readability indexes. In: Proceedings of the ACM’82 conference. ACM, pp 44–48

  • Metzler D, Croft WB (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274

    Article  Google Scholar 

  • Nakakoji K, Yamamoto Y, Nishinaka Y, Kishida K, Ye Y (2002) Evolution patterns of open-source software systems and communities. In: Proceedings of the international workshop on Principles of software evolution. ACM, pp 76–85

  • Niu S, Guo J, Lan Y, Cheng X (2012) Top-k learning to rank: labeling, ranking and evaluation. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 751–760

  • Padhye R, Mani S, Sinha VS (2014) A study of external community contribution to open-source projects on github. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 332–335

  • Pham R, Singer L, Liskin O, Figueira Filho F, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 112–121

  • Steinmacher I, Pinto G, Wiese I, Gerosa MA (2018) Almost there: A study on quasi-contributors in open-source software projects, p 12

  • Tenny T (1988) Program readability: procedures versus comments. IEEE Trans Softw Eng 14(9):1271–1279

    Article  Google Scholar 

  • Thongtanunam P, Kula RG, Cruz AEC, Yoshida N, Iida H (2014) Improving code review effectiveness through reviewer recommendations. In: Proceedings of the 7th international workshop on cooperative and human aspects of software engineering. ACM, pp 119–122

  • Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto K-I (2015) Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 141–150

  • Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th international conference on Software engineering. ACM, pp 356–366

  • Van Der Veen E, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 357–361

  • Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816

  • Wang Y, Wang L, Li Y, He D, Chen W, Liu T-Y (2013) A theoretical analysis of ndcg ranking measures. In: Proceedings of the 26th annual conference on learning theory (COLT 2013)

  • Weißgerber P, Neu D, Diehl S (2008) Small patches get in!. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 67–76

  • Yu Y (2014) Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 2014 21st Asia-Pacific, software engineering conference (APSEC), vol 1. IEEE, pp 335–342

  • Yu Y, Wang H, Yin G, Ling CX (2014) Reviewer recommender of pull-requests in github. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 609–612

  • Zar JH (1998) Spearman rank correlation, Encyclopedia of Biostatistics

  • Zhou J, Zhang H (2012) Learning to rank duplicate bug reports. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 852–861

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Zhao.

Additional information

Communicated by: Alexander Serebrenik

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Decision Table

Table 15 Decision table for highly correlated pairs

Appendix B: Survey E-mail Template

Dear %s,

My name is Guoliang Zhao. I am a PhD student at Queen’s University, Canada. I am inviting you to participate in a survey (consists of only 5 questions and will not take you more than 5 minutes) about improving the pull requests review process because you are an active GitHub code reviewer. We apologize in advance if our email is unwanted or a waste of your time. Your feedback would be highly valuable to our research. This study will help us understand how Github code reviewers would react to our pull requests recommending approach. There are no mandatory questions in our survey. To participate, please click on the following link: https://docs.google.com/forms/d/e/1FAIpQLScUWVQkpS42fVNNfshEKwrKKgb7EKJb5HxLVLbKpNJYbnqk1A/viewform?usp=sf_link

To compensate you for your time, you may win one $10 Amazon gift card if you complete the full questionnaire. We will randomly select 20% of participants as winners. We would also be happy to share with you the results of the survey, if you are interested.

If you have any questions about this survey, or difficulty in accessing the site or completing the survey, please contact Guoliang Zhao at g.zhao@queensu.ca or Daniel Alencar da Costa at daniel.alencar@queensu.ca. Thank you in advance for your time and for providing this important feedback!

Best regards,

Guoliang

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, G., da Costa, D.A. & Zou, Y. Improving the pull requests review process using learning-to-rank algorithms. Empir Software Eng 24, 2140–2170 (2019). https://doi.org/10.1007/s10664-019-09696-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09696-8

Keywords

Navigation