Abstract
Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose ReBack (Recommending Backports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org (2015). https://www.tensorflow.org/
Chakroborti, D.: ReBack BenchMark (2021a). https://doi.org/10.5281/zenodo.6715562
Chakroborti, D.: ReBack Tool (2021b). https://doi.org/10.5281/zenodo.6715463
Ansible: Backport ReadmeD. https://tinyurl.com/backportREADMEMD. [Online; accessed 5-Dec-2021] (2021)
Ansible: DevelopmentProcess.rst. https://tinyurl.com/ansibledevelopmentprocessrst. [Online; accessed 22-June-2021] (2021)
Ansible: README.md. https://tinyurl.com/ansiblebackportREADME. [Online; accessed 22-June-2021] (2020)
Ansible: The Ansible Development Cycle. https://tinyurl.com/information-labels. [Online; accessed 5-Dec-2021] (2021)
Azeem, M.I., Panichella, S., Di Sorbo, A., Serebrenik, A., Wang, Q.: Action-Based Recommendation in Pull-Request Development, pp. 115–124. Association for Computing Machinery, New York, NY, USA (2020)
Cabot, J., Cánovas Izquierdo, J.L., Cosentino, V., Rolandi, B.: Exploring the use of labels to categorize issues in open-source software projects. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 550–554 (2015). https://doi.org/10.1109/SANER.2015.7081875
Chakroborti, D., Schneider, K.A., Roy, C.K.: Backports: Change types, challenges and strategies. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. ICPC ’22, pp. 636–647. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3524610.3527920
Chen, D., Stolee, K.T., Menzies, T.: Replication can improve prior results: A github study of pull request acceptance. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 179–190 (2019). https://doi.org/10.1109/ICPC.2019.00037
Chollet, F., et al.: Keras. https://keras.io/. [Online; accessed 1-Sep-2021] (2021)
Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S.: Fault triggers in open-source software: An experience report. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp. 178–187 (2013). https://doi.org/10.1109/ISSRE.2013.6698917
DP, K., Ba, J.: Adam: a method for stochastic optimization. In: Proc. of the 3rd International Conference for Learning Representations (ICLR) (2015)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning. ICML ’06, pp. 233–240. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1143844.1143874
de Lima Júnior, M.L., Soares, D.M., Plastino, A., Murta, L.: Automatic assignment of integrators to pull requests: the importance of selecting appropriate attributes. J. Syst. Softw. 144, 181–196 (2018). https://doi.org/10.1016/j.jss.2018.05.065
de Lima Júnior, M.L., Soares, D.M., Plastino, A., Murta, L.: Developers assignment for analyzing pull requests. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, pp. 1567–1572. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2695664.2695884
Dehaghani, S.M.H., Hajrahimi, N.: Which factors affect software projects maintenance cost more? Acta Inform. Med. 21(1), 63 (2013)
German, D.M., Di Penta, M., Gueheneuc, Y.-G., Antoniol, G.: Code siblings: Technical and legal implications of copying code between applications. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp. 81–90 (2009). https://doi.org/10.1109/MSR.2009.5069483
GitHub: About branches. https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-branches. [Online; accessed 1-Sep-2021] (2021)
GitHub: About forks. https://docs.github.com/en/get-started/quickstart/fork-a-repo. [Online; accessed 1-Sep-2021] (2021)
GitHub: About Pull-requests. https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests. [Online; accessed 10-Oct-2021] (2021)
GitHub: Query backport. https://tinyurl.com/Querybackport. [Online; accessed 5-Dec-2021] (2021)
Gousios, G., Storey, M.-A., Bacchelli, A.: Work practices and challenges in pull-based development: The contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE ’16, pp. 285–296. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2884781.2884826
Gousios, G., Zaidman, A., Storey, M.-A., Deursen, A.v.: Work practices and challenges in pull-based development: The integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 358–368 (2015). https://doi.org/10.1109/ICSE.2015.55
Gu, X., Han, Y.-S., Kim, S., Zhang, H.: Do Bugs Propagate? An Empirical Analysis of Temporal Correlations Among Software Bugs. In: Møller, A., Sridharan, M. (eds.) 35th European Conference on Object-Oriented Programming (ECOOP 2021). Leibniz International Proceedings in Informatics (LIPIcs), vol. 194, pp. 11–11121. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2021). https://doi.org/10.4230/LIPIcs.ECOOP.2021.11. https://drops.dagstuhl.de/opus/volltexte/2021/14054
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., Oliphant, T.E.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Hoang, T., Lawall, J., J. Oentaryo, R., Tian, Y., Lo, D.: Patchnet: A tool for deep patch classification. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 83–86 (2019). https://doi.org/10.1109/ICSE-Companion.2019.00044
Hoang, T., Lawall, J., Tian, Y., Oentaryo, R.J., Lo, D.: Patchnet: hierarchical deep learning-based stable patch identification for the linux kernel. IEEE Trans. Softw. Eng. (2019). https://doi.org/10.1109/TSE.2019.2952614
Jiang, J., Yang, Y., He, J., Blanc, X., Zhang, L.: Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull-based development. Inf. Softw. Technol. 84, 48–62 (2017). https://doi.org/10.1016/j.infsof.2016.10.006
Jiang, J., Wu, Q., Cao, J., Xia, X., Zhang, L.: Recommending tags for pull requests in github. Inf. Softw. Technol. 129, 106394 (2021)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014). https://doi.org/10.1109/CVPR.2014.223
Kibana: Creating PRs. https://tinyurl.com/READMEpluginsMD. [Online; accessed 5-Dec-2021] (2021)
Kibana: README.md. https://tinyurl.com/kibanaREADMEmd. [Online; accessed 22-June-2021] (2021)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1181. https://www.aclweb.org/anthology/D14-1181
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017)
Kokubun, T.: Gitstar Ranking. https://gitstar-ranking.com/repositories. [Online; accessed 19-August-2021] (2014)
Kononenko, O., Rose, T., Baysal, O., Godfrey, M., Theisen, D., de Water, B.: Studying pull request merges: A case study of shopify’s active merchant. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 124–133 (2018)
Kononenko, O., Rose, T., Baysal, O., Godfrey, M., Theisen, D., de Water, B.: Studying pull request merges: A case study of shopify’s active merchant. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. ICSE-SEIP ’18, pp. 124–133. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3183519.3183542
Krasner, H.: The cost of poor software quality in the us: A 2020 report. In: Proc. Consortium Inf. Softw. QualityTM (CISQTM) (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Lawall, J., Palinski, D., Gnirke, L., Muller, G.: Fast and precise retrieval of forward and back porting information for linux device drivers. In: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference. USENIX ATC ’17, pp. 15–26. USENIX Association, USA (2017)
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997). https://doi.org/10.1109/72.554195
Li, Z., Yin, G., Yu, Y., Wang, T., Wang, H.: Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware. Internetware’17. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3131704.3131725
Li, Z., Yin, G., Yu, Y., Wang, T., Wang, H.: Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware. Internetware’17. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3131704.3131725
Li, Z., Yu, Y., Yin, G., Wang, T., Fan, Q., Wang, H.: Automatic classification of review comments in pull-based development model. In: SEKE (2017)
Li, Z., Yu, Y., Yin, G., Wang, T., Wang, H.: What are they talking about? analyzing code reviews in pull-based development model. J. Comput. Sci. Technol. 32, 1060–1075 (2017)
Li, Y., Zhu, C., Rubin, J., Chechik, M.: Semantic slicing of software version histories. IEEE Trans. Softw. Eng. 44(2), 182–201 (2018). https://doi.org/10.1109/TSE.2017.2664824
Mohamed, A., Zhang, L., Jiang, J., Ktob, A.: Predicting which pull requests will get reopened in github. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp. 375–385 (2018). https://doi.org/10.1109/APSEC.2018.00052
Mondal, M., Roy, C.K., Schneider, K.A.: Bug propagation through code cloning: An empirical study. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 227–237 (2017). https://doi.org/10.1109/ICSME.2017.33
Ng, A.Y.: Feature selection, L1 vs L2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML ’04, p. 78. Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/1015330.1015435
Pham, H., Dai, Z., Xie, Q., Le, Q.V.: Meta pseudo labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11557–11568 (2021)
Powers, D.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. ArXiv abs/2010.16061 (2020)
PyGitHUb: About PyGitHUb. http://pygithub.readthedocs.io/en/latest/. [Online; accessed 1-Sep-2021] (2021)
Rahman, M.M., Roy, C.K.: An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories. MSR 2014, pp. 364–367. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2597073.2597121
Ray, B., Kim, M., Person, S., Rungta, N.: Detecting and characterizing semantic inconsistencies in ported code. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 367–377 (2013). https://doi.org/10.1109/ASE.2013.6693095
Ray, B., Kim, M.: A case study of cross-system porting in forked projects. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. FSE ’12. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2393596.2393659
Ren, L.: Automated patch porting across forked projects. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 1199–1201. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3338906.3342488
Silva, M.C.O., Valente, M.T., Terra, R.: Does technical debt lead to the rejection of pull requests? SBSI 2016, pp. 248–254. Brazilian Computer Society, Porto Alegre, BRA (2016)
Soares, D.M., de Lima Júnior, M.L., Plastino, A., Murta, L.: What factors influence the reviewer assignment to pull requests? Inf. Softw. Technol. 98, 32–43 (2018). https://doi.org/10.1016/j.infsof.2018.01.015
Stanciulescu, S., Schulze, S., Wasowski, A.: Forked and integrated variants in an open-source firmware project. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 151–160 (2015). https://doi.org/10.1109/ICSM.2015.7332461
Terrell, J., Kofink, A., Middleton, J., Rainear, C., Murphy-Hill, E., Parnin, C., Stallings, J.: Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Comput. Sci. 3, 111 (2017)
Tufano, R., Pascarella, L., Tufano, M., Poshyvanyk, D., Bavota, G.: Towards automating code review activities. arXiv e-prints, 2101 (2021)
v. d. Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 357–361 (2015)
Wang, Q., Xu, B., Xia, X., Wang, T., Li, S.: Duplicate pull request detection: When time matters. In: Proceedings of the 11th Asia-Pacific Symposium on Internetware. Internetware ’19. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3361242.3361254
Yang, C., Zhang, X.-H., Zeng, L.-B., Fan, Q., Wang, T., Yu, Y., Yin, G., Wang, H.-M.: Revrec: a two-layer reviewer recommendation algorithm in pull-based development model. J. Central South Univ. 25(5), 1129–1143 (2018)
Yu, Y., Wang, H., Filkov, V., Devanbu, P., Vasilescu, B.: Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 367–371 (2015). https://doi.org/10.1109/MSR.2015.42
Yu, Y., Wang, H., Yin, G., Ling, C.X.: Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 2014 21st Asia-Pacific Software Engineering Conference, vol. 1, pp. 335–342 (2014). https://doi.org/10.1109/APSEC.2014.57
Yu, S., Xu, L., Zhang, Y., Wu, J., Liao, Z., Li, Y.: Nbsl: A supervised classification model of pull request in github. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6 (2018)
Yu, Y., Wang, H., Yin, G., Wang, T.: Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment? Inf. Softw. Technol. 74, 204–218 (2016). https://doi.org/10.1016/j.infsof.2016.01.004
Zampetti, F., Bavota, G., Canfora, G., Penta, M.D.: A study on the interplay between pull request review and continuous integration builds. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 38–48 (2019)
Zampetti, F., Ponzanelli, L., Bavota, G., Mocci, A., Di Penta, M., Lanza, M.: How developers document pull requests with external references. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 23–33 (2017). https://doi.org/10.1109/ICPC.2017.30
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling Vision Transformers. arXiv e-prints, 2106–04560 (2021) arXiv:2106.04560 [cs.CV]
Acknowledgements
This research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grants, and by an NSERC Collaborative Research and Training Experience (CREATE) grant, and by two Canada First Research Excellence Fund (CFREF) grants coordinated by the Global Institute for Food Security (GIFS) and the Global Institute for Water Security (GIWS).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study's conception and design. Material preparation, data collection, and analysis were performed by Debasish Chakroborti. The first draft of the manuscript was written by Debasish Chakroborti, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chakroborti, D., Schneider, K.A. & Roy, C.K. ReBack: recommending backports in social coding environments. Autom Softw Eng 31, 18 (2024). https://doi.org/10.1007/s10515-024-00416-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-024-00416-1