skip to main content
10.1145/3379597.3387489acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

On the Shoulders of Giants: A New Dataset for Pull-based Development Research

Published: 18 September 2020 Publication History

Abstract

Pull-based development is a widely adopted paradigm for collaboration in distributed software development, attracting eyeballs from both academic and industry. To better study pull-based development model, this paper presents a new dataset containing 96 features collected from 11,230 projects and 3,347,937 pull requests. We describe the creation process and explain the features in details. To the best of our knowledge, our dataset is the most comprehensive and largest one toward a complete picture for pull-based development research.

References

[1]
O. Baysal, O. Kononenko, R. Holmes, and M. W. Godfrey. 2012. The Secret Life of Patches: A Firefox Case Study. In 2012 19th Working Conference on Reverse Engineering. 447--455. https://doi.org/10.1109/WCRE.2012.54
[2]
O. Baysal, O. Kononenko, R. Holmes, and M. W. Godfrey. 2013. The influence of non-technical factors on code review. In 2013 20th Working Conference on Reverse Engineering (WCRE). 122--131. https://doi.org/10.1109/WCRE.2013.6671287
[3]
Amiangshu Bosu and Jeffrey C. Carver. 2014. Impact of Developer Reputation on Code Review Outcomes in OSS Projects: An Empirical Investigation. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (Torino, Italy) (ESEM âĂŹ14). Association for Computing Machinery, New York, NY, USA, Article Article 33, 10 pages. https://doi.org/10.1145/2652524.2652544
[4]
Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2017. A preliminary analysis on the effects of propensity to trust in distributed software development. In 2017 IEEE 12th international conference on global software engineering (ICGSE). IEEE, 56--60.
[5]
Casey Casalnuovo, Bogdan Vasilescu, Premkumar Devanbu, and Vladimir Filkov. 2015. Developer onboarding in GitHub: the role of prior social links and language experience. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 817--828.
[6]
Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (San Francisco, CA, USA) (MSR '13). IEEE Press, Piscataway, NJ, USA, 233--236. http://dl.acm.org/citation.cfm?id=2487085.2487132
[7]
Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An Exploratory Study of the Pull-Based Software Development Model. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 345âĂŞ355. https://doi.org/10.1145/2568225.2568260
[8]
Georgios Gousios and Andy Zaidman. 2014. A Dataset for Pull-Based Development Research. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 368âĂŞ371. https://doi.org/10.1145/2597073.2597122
[9]
G. Gousios, A. Zaidman, M. Storey, and A. v. Deursen. 2015. Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 358--368. https://doi.org/10.1109/ICSE.2015.55
[10]
R.N. Iyer, S. A. Yun, M. Nagappan, and J. Hoey. 2019. Effects of Personality Traits on Pull Request Acceptance. IEEE Transactions on Software Engineering (2019), 1--1. https://doi.org/10.1109/TSE.2019.2960357
[11]
Iyer, Rahul. 2019. Effects of Personality Traits and Emotional Factors in Pull Request Acceptance. http://hdl.handle.net/10012/14952
[12]
Y.Jiang, B. Adams, and D. M. German. 2013. Will my patch make it? And how fast? Case study on the Linux kernel. In 2013 10th Working Conference on Mining Software Repositories (MSR). 101--110. https://doi.org/10.1109/MSR.2013.6624016
[13]
Nikhil Khadke, Ming Han Teh, and Minghan Shen. [n.d.]. Predicting Acceptance of GitHub Pull Requests. ([n.d.]).
[14]
O. Kononenko, T. Rose, O. Baysal, M. Godfrey, D. Theisen, and B. de Water. 2018. Studying Pull Request Merges: A Case Study of Shopify's Active Merchant. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 124--133.
[15]
Rohan Padhye, Senthil Mani, and Vibha Singhal Sinha. 2014. A Study of External Community Contribution to Open-Source Projects on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 332âĂŞ335. https://doi.org/10.1145/2597073.2597113
[16]
Gustavo Pinto, Luiz Felipe Dias, and Igor Steinmacher. 2018. Who Gets a Patch Accepted First? Comparing the Contributions of Employees and Volunteers. In Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (Gothenburg, Sweden) (CHASE âĂŹ18). Association for Computing Machinery, New York, NY, USA, 110âĂŞ113. https://doi.org/10.1145/3195836.3195858
[17]
Mohammad Masudur Rahman and Chanchal K. Roy. 2014. An Insight into the Pull Requests of GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 364âĂŞ367. https://doi.org/10.1145/2597073.2597121
[18]
Ayushi Rastogi. 2016. Do Biases Related to Geographical Location Influence Work-Related Decisions in GitHub?. In Proceedings of the 38th International Conference on Software Engineering Companion (Austin, Texas) (ICSE âĂŹ16). Association for Computing Machinery, New York, NY, USA, 665âĂŞ667. https://doi.org/10.1145/2889160.2891035
[19]
Ayushi Rastogi, Nachiappan Nagappan, Georgios Gousios, and André van der Hoek. 2018. Relationship between Geographical Location and Evaluation of Developer Contributions in Github. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (Oulu, Finland) (ESEM âĂŹ18). Association for Computing Machinery, New York, NY, USA, Article Article 22, 8 pages. https://doi.org/10.1145/3239235.3240504
[20]
D. M. Soares, M. L. d. L. JÞnior, L. Murta, and A. Plastino. 2015. Rejection Factors of Pull Requests Filed by Core Team Developers in Software Projects with High Acceptance Rates. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). 960--965. https://doi.org/10.1109/ICMLA.2015.41
[21]
Daricélio Moreira Soares, Manoel Limeira de Lima Júnior, Leonardo Murta, and Alexandre Plastino. 2015. Acceptance Factors of Pull Requests in Open-Source Projects. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (Salamanca, Spain) (SAC âĂŹ15). Association for Computing Machinery, New York, NY, USA, 1541âĂŞ1546. https://doi.org/10.1145/2695664.2695856
[22]
Y. Tao, D. Han, and S. Kim. 2014. Writing Acceptable Patches: An Empirical Study of Open Source Project Patches. In 2014 IEEE International Conference on Software Maintenance and Evolution. 271--280. https://doi.org/10.1109/ICSME.2014.49
[23]
Josh Terrell, Andrew Kofink, Justin Middleton, Clarissa Rainear, Emerson Murphy-Hill, Chris Parnin, and Jon Stallings. 2017. Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Computer Science 3 (2017), e111.
[24]
Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 356âĂŞ366. https://doi.org/10.1145/2568225.2568315
[25]
Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark G.J. van den Brand, Alexander Serebrenik, Premkumar Devanbu, and Vladimir Filkov. 2015. Gender and Tenure Diversity in GitHub Teams. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI âĂŹ15). Association for Computing Machinery, New York, NY, USA, 3789âĂŞ3798. https://doi.org/10.1145/2702123.2702549
[26]
B. Vasilescu, S. van Schuylenburg, J. Wulms, A. Serebrenik, and M. G. J. van den Brand. 2014. Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub. In 2014 IEEE International Conference on Software Maintenance and Evolution. 401--405. https://doi.org/10.1109/ICSME.2014.62
[27]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 805âĂŞ816. https://doi.org/10.1145/2786805.2786850
[28]
Y. Yu, H. Wang, G. Yin, and C. X. Ling. 2014. Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration. In 2014 21st Asia-Pacific Software Engineering Conference, Vol. 1. 335--342. https://doi.org/10.1109/APSEC.2014.57
[29]
Yue Yu, Gang Yin, Tao Wang, Cheng Yang, and Huaimin Wang. 2016. Determinants of pull-based development in the context of continuous integration. Science China Information Sciences 59, 8 (2016), 080104. https://doi.org/10.1007/s11432-016-5595-8
[30]
F. Zampetti, G. Bavota, G. Canfora, and M. D. Penta. 2019. A Study on the Interplay between Pull Request Review and Continuous Integration Builds. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 38--48. https://doi.org/10.1109/SANER.2019.8667996
[31]
Jiaxin Zhu, Minghui Zhou, and Audris Mockus. 2016. Effectiveness of code contribution: From patch-based to pull-request-based tools. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 871--882.

Cited By

View all
  • (2025)E-PRedictor: an approach for early prediction of pull request acceptanceScience China Information Sciences10.1007/s11432-022-3953-468:5Online publication date: 16-Jan-2025
  • (2024)Sharing Software-Evolution Datasets: Practices, Challenges, and RecommendationsProceedings of the ACM on Software Engineering10.1145/36607981:FSE(2051-2074)Online publication date: 12-Jul-2024
  • (2024)State‐of‐the‐practice in quality assurance in Java‐based open source software developmentSoftware: Practice and Experience10.1002/spe.332154:8(1408-1446)Online publication date: 4-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories
June 2020
675 pages
ISBN:9781450375177
DOI:10.1145/3379597
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed software development
  2. pull request
  3. pull-based development

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

MSR '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)E-PRedictor: an approach for early prediction of pull request acceptanceScience China Information Sciences10.1007/s11432-022-3953-468:5Online publication date: 16-Jan-2025
  • (2024)Sharing Software-Evolution Datasets: Practices, Challenges, and RecommendationsProceedings of the ACM on Software Engineering10.1145/36607981:FSE(2051-2074)Online publication date: 12-Jul-2024
  • (2024)State‐of‐the‐practice in quality assurance in Java‐based open source software developmentSoftware: Practice and Experience10.1002/spe.332154:8(1408-1446)Online publication date: 4-Mar-2024
  • (2023)How social interactions can affect Modern Code ReviewFrontiers in Computer Science10.3389/fcomp.2023.11780405Online publication date: 11-May-2023
  • (2023)Understanding the Helpfulness of Stale Bot for Pull-Based Development: An Empirical Study of 20 Large Open-Source ProjectsACM Transactions on Software Engineering and Methodology10.1145/362473933:2(1-43)Online publication date: 23-Dec-2023
  • (2023)On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests–A Mixed-Methods Study of 10 Large Open-Source ProjectsACM Transactions on Software Engineering and Methodology10.1145/353078532:1(1-39)Online publication date: 13-Feb-2023
  • (2023)Pull Request Decisions Explained: An Empirical OverviewIEEE Transactions on Software Engineering10.1109/TSE.2022.316505649:2(849-871)Online publication date: 1-Feb-2023
  • (2023)Testability Refactoring in Pull Requests: Patterns and Trends2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00131(1508-1519)Online publication date: May-2023
  • (2022)Do small code changes merge faster?Proceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528448(537-548)Online publication date: 23-May-2022
  • (2022)Pull request latency explained: an empirical overviewEmpirical Software Engineering10.1007/s10664-022-10143-427:6Online publication date: 1-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media