Effectiveness of exploring historical commits for developer recommendation: an empirical study

Sun, Xiaobing; Yang, Hui; Leung, Hareton; Li, Bin; Li, Hanchao Jerry; Liao, Lingzhi

doi:10.1007/s11704-016-6023-3

Effectiveness of exploring historical commits for developer recommendation: an empirical study

Research Article
Published: 07 February 2018

Volume 12, pages 528–544, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Xiaobing Sun^1,2,5,
Hui Yang¹,
Hareton Leung³,
Bin Li¹,
Hanchao Jerry Li⁴ &
…
Lingzhi Liao⁶

155 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Developer recommendation is an essential task for resolving incoming issues in the evolution of software. Many developer recommendation techniques have been developed in the literature; among these studies, most techniques usually combined historical commits as supplementary information with bug repositories and/or source-code repositories to recommend developers. However, the question of whether the messages in historical commits are always useful has not yet been answered. This article aims at solving this problem by conducting an empirical study on four open-source projects. The results show that: (1) the number of meaningful words of the commit description has an impact on the quality of the commit, and a larger number of meaningful words in the description means that it can generally better reflect developers’ expertise; (2) using commit description to recommend the relevant developers is better than that using relevant files that are recorded in historical commits; (3) developers tend to change the relevant files that they have changed many times before; (4) developers generally tend to change the files that they have changed recently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Commenting source code: is it worth it for small programming tasks?

Article 16 November 2018

DevRec: A Developer Recommendation System for Open Source Repositories

Learning Human-Written Commit Messages to Document Code Changes

Article 30 November 2020

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Eyolfson J, Tan L, Lam P. Correlations between bugginess and timebased commit characteristics. Empirical Software Engineering, 2014, 19(4): 1009–1039
Article Google Scholar
Brindescu C, Codoban M, Shmarkatiuk S, Dig D. How do centralized and distributed version control systems impact software changes? In: Proceedings of the 36th International Conference on Software Engineering. 2014, 322–333
Google Scholar
Sun X, Zhou T, Li G, Hu J, Yang H, Li B. An empirical study on real bugs for machine learning programs. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference. 2017, 348–357
Google Scholar
Fagerholm F, Guinea A S, Münch J, Borenstein J. The role of mentoring and project characteristics for onboarding in open source software projects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2014
Google Scholar
Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering. 2006, 361–370
Google Scholar
Zhou Z, Wang Y, Wu Q J, Yang C N, Sun X. Effective and efficient global context verification for image copy detection. IEEE Transactions on Information Forensics and Security, 2016, 12(1): 48–63
Article Google Scholar
Sun X, Yang H, Xia X, Li B. Enhancing developer recommendation with supplementary information via mining historical commits. Journal of Systems and Software, 2017, 134: 355–368
Article Google Scholar
Hossen K, Kagdi H H, Poshyvanyk D. Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 130–141
Google Scholar
Shobe J F, Karim M Y, Zanjani M B, Kagdi H. On mapping releases to commits in open source systems. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 68–71
Google Scholar
Zanjani M B, Swartzendruber G, Kagdi H. Impact analysis of change requests on source code based on interaction and commit histories. In: Proceedings of the 11th Working Conference on Mining Software Repositories. 2014, 162–171
Google Scholar
Yang H, Sun X, Li B, Hu J. Recommending developers with supplementary information for issue request resolution. In: Proceedings of the 38th International Conference on Software Engineering. 2016, 707–709
Google Scholar
Wang S, Lo D. Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 53–63
Google Scholar
McDonald D W, Ackerman M S. Expertise recommender: a flexible recommendation system and architecture. In: Proceedings of ACM Conference on Computer Supported Cooperative Work. 2000, 231–240
Google Scholar
Yang H, Sun X, and Bin Li Y D. DR_PSF: enhancing developer recommendation by leveraging personalized source-code files. In: Proceedings of the 40th IEEE Computer Society International Conference on Computers, Software and Applications. 2016, 239–244
Google Scholar
Xia X, Lo D, Wang X, Zhou B. Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering. 2013, 72–81
Google Scholar
Fu Z, Ren K, Shu J, Sun X, Huang F. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(9): 2546–2559
Article Google Scholar
Kagdi H, Gethers M, Poshyvanyk D, Hammad M. Assigning change requests to software developers. Journal of Software: Evolution and Process, 2012, 24(1): 3–33
Google Scholar
Anvik J, Murphy G C. Reducing the effort of bug report triage: Recommenders for development-oriented decisions. ACM Transactions on Software Engineering and Methodology, 2011, 20(3): 10
Article Google Scholar
Canfora G, Cerulo L. How software repositories can help in resolving a new change request. In: Proceedings of Workshop on Empirical Studies in Reverse Engineering. 2005
Google Scholar
Bhattacharya P, Neamtiu I, Shelton C R. Automated, highly-accurate, bug assignment using machine learning and tossing graphs. Journal of Systems and Software, 2012, 85(10): 2275–2292
Article Google Scholar
Ahsan S N, Ferzund J, Wotawa F. Automatic software bug triage system (BTS) based on latent semantic indexing and support vector machine. In: Proceedings of the 4th International Conference on Software Engineering Advances. 2009, 216–221
Google Scholar
Kagdi H H, Hammad M,Maletic J I. Who can help me with this source code change? In: Proceedings of the 24th IEEE International Conference on Software Maintenance. 2008, 157–166
Google Scholar
Gu B, Sheng V S,Wang Z, Ho D, Osman S, Li S. Incremental learning for v-support vector regression. Neural Networks, 2015, 67: 140–150
Article Google Scholar
Vásquez M L, Hossen K, Dang H, Kagdi H H, Gethers M, Poshyvanyk D. Triaging incoming change requests: bug or commit history, or code authorship? In: Proceedings of the 28th IEEE International Conference on Software Maintenance. 2012, 451–460
Google Scholar
Gu B, Sheng V S, Tay K, Romano W, Li S. Incremental support vector learning for ordinal regression. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(7): 1403–1416
Article MathSciNet Google Scholar
Hu H, Zhang H, Xuan J, Sun W. Effective bug triage based on historical bug-fix information. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering. 2014, 122–132
Google Scholar
Shokripour R, Anvik J, Kasirun Z M, Zamani S. Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 2–11
Google Scholar
Xia Z, Wang X, Sun X, Wang B. Steganalysis of least significant bit matching using multi-order differences. Security and Communication Networks, 2014, 7(8): 1283–1291
Article Google Scholar
Kagdi H H, Poshyvanyk D. Who can help me with this change request? In: Proceedings of the 17th IEEE International Conference on Program Comprehension. 2009, 273–277
Google Scholar
Shokripour R, Anvik J, Kasirun ZM, Zamani S. A time-based approach to automatic bug report assignment. Journal of Systems and Software, 2015, 102: 109–122
Article Google Scholar
Ma T, Zhou J, Tang M, Tian Y, Al-Dhelaan A, Al-Rodhaan M, Lee S. Social network and tag sources based augmenting collaborative recommender system. IEICE Transactions on Information and Systems, 2015, 98(4): 902–910
Article Google Scholar
Sun X, Peng X, Li B, Li B, WenW. IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra. Frontiers of Computer Science, 2016, 10(5): 812–831
Article Google Scholar
Wang L, Sun X,Wang J, Duan Y, Li B. Construct bug knowledge graph for bug resolution: poster. In: Proceedings of the 39th International Conference on Software Engineering. 2017, 189–191
Google Scholar
Sun X, Li B, Li Y, Chen Y. What information in software historical repositories do we need to support software maintenance tasks? an approach based on topic model. Computer and Information Science, 2015, 27–37
Google Scholar
Zhang Y, Sun X, Wang B. Efficient algorithm for k-barrier coverage based on integer linear programming. China Communications, 2016, 13(7): 16–23
Article Google Scholar
Sun X, Liu X, Li B, Duan Y, Yang H, Hu J. Exploring topic models in software engineering data analysis: a survey. In: Proceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. 2016, 357–362
Google Scholar
Sun X, Li B, Leung H K N, Li B, Li Y. MSR4SM: using topic models to effectively mining software repositories for software maintenance tasks. Information & Software Technology, 2015, 66: 1–12
Article Google Scholar
Xie S, Wang Y. Construction of tree network with limited delivery latency in homogeneous wireless sensor networks. Wireless Personal Communications, 2014, 78(1): 231–246
Article Google Scholar
Yang H, Sun X, Duan Y, Li B. On the effects of exploring historical commit messages for developer recommendation. Chinese Journal of Electronics, 2016, 25(4): 658–664
Article Google Scholar
Fu Z, Wu X, Guan C, Sun X, Ren K. Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security, 2016, 11(12): 2706–2716
Article Google Scholar
Fu Z, Sun X, Liu Q, Zhou L, Shu J. Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Transactions on Communications, 2015, E98.B(1): 190–200
Article Google Scholar
McBurney P W, McMillan C. Automatic documentation generation via source code summarization of method context. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 279–290
Google Scholar
Hindle A, Germán D M, Holt R C. What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the International Working Conference on Mining Software Repositories. 2008, 99–108
Google Scholar
Hattori L P, Lanza M. On the nature of commits. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2008, 63–71
Google Scholar
Sun X, Geng Q, Lo D, Duan Y, Liu X, Li B. Code comment quality analysis and improvement recommendation: an automated approach. International Journal of Software Engineering and Knowledge Engineering, 2016, 26(6): 981–1000
Article Google Scholar
Beyer D, Fararooy A. CheckDep: a tool for tracking software dependencies. In: Proceedings of the 18th International Conference on Program Comprehension. 42–43
Ye X, Bunescu R C, Liu C. Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2014, 689–699
Google Scholar
Leacock C, Chodorow M. Combining local context and wordnet similarity for word sense identification. WordNet: An Electronic Lexical Database, 1998, 49(2): 265–283
Google Scholar
Sun X, Liu X, Hu J, Zhu J. Empirical studies on the nlp techniques for source code data preprocessing. In: Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies. 2014, 32–39
Google Scholar
Porter M F. An algorithm for suffix stripping. Program, 1980, 14(3): 130–137
Article Google Scholar
Conover W J. Practical nonparametric statistics. Technometrics, 1999
Google Scholar
Grissom R J, Kim J J. Effect sizes for research: a broad practical approach. British Journal ofMathematical & Statistical Psychology, 2002
Google Scholar
Bavota G, Vásquez M L, Bernal-Cárdenas C E, Penta M D, Oliveto R, Poshyvanyk D. The impact of API change-and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering, 2015, 41(4): 384–407
Article Google Scholar
Xia Z, Wang X, Sun X, Wang Q. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Transactions on Parallel & Distributed Systems, 2016, 27(2): 340–352
Article Google Scholar
Madsen R E, Sigurdsson S, Hansen L K, Larsen J. Pruning the vocabulary for better context recognition. In: Proceedings of International Conference on Pattern Recognition. 2004, 483–488
Google Scholar
Corazza A, Martino S D, Maggio V. Linsen: an efficient approach to split identifiers and expand abbreviations. In: Proceedings of the 28th IEEE International Conference on Software Maintenance. 2012, 233–242
Google Scholar
Guerrouj L, Penta MD, Antoniol G, Guéhéneuc Y G. Tidier: an identifier splitting approach using speech recognition techniques. Software: Evolution and Process, 2013, 25(6): 575–599
Google Scholar
Xia Z, Wang X, Sun X, Liu Q, Xiong N. Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools & Applications, 2016, 75(4): 1947–1962
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61402396, 61611540347, and 61472344), the Open Project Foundation of Information Technology Research Base of Civil Aviation Administration of China (CAAC-ITRB- 201704), the Open Funds of the State Key Laboratory for Novel Software Technology of Nanjing University (KFKT2016B21), the Jiangsu Qin Lan Project, the China Postdoctoral Science Foundation (2015M571489), and the Natural Science Foundation of Yangzhou City (YZ2017113). The authors would like to sincerely thank the anonymous reviewers who provided useful suggestions that helped to improve the article.

Author information

Authors and Affiliations

School of Information Engineering, Yangzhou University, Yangzhou, 225127, China
Xiaobing Sun, Hui Yang & Bin Li
Information Technology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin, 300300, China
Xiaobing Sun
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Hareton Leung
Coventry University, Coventry CVI 5FB, UK
Hanchao Jerry Li
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Xiaobing Sun
Nanjing University of Information Science and Technology, Nanjing, 210044, China
Lingzhi Liao

Authors

Xiaobing Sun
View author publications
Search author on:PubMed Google Scholar
Hui Yang
View author publications
Search author on:PubMed Google Scholar
Hareton Leung
View author publications
Search author on:PubMed Google Scholar
Bin Li
View author publications
Search author on:PubMed Google Scholar
Hanchao Jerry Li
View author publications
Search author on:PubMed Google Scholar
Lingzhi Liao
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xiaobing Sun.

Additional information

Xiaobing Sun is an associate professor in the School of Information Engineering, Yangzhou University, China. He is a senior China Computer Federation (CCF) and ACM member. His current research interests include software analysis, testing, maintenance, and evolution.

Hui Yang is a master student in the School of Information Engineering, Yangzhou University, China. He is a CCF student member. His current research interests include recommendation systems for software maintenance, bug fixing, and developer recommendation.

Hareton Leung is the director of the Lab for Software Development and Management, The Hong Kong Polytechnic University, China. His current research interests include software testing, quality assurance, process and quality improvement.

Bin Li is a professor in the School of Information Engineering, Yangzhou University, China. He is a senior CCF member and ACM member. His current research interests include artificial intelligence, machine learning, and crowdsourcing computing.

Hanchao (Jerry) Li is a PhD student in Coventry University, UK. His current research interest is mathematical modeling and various modern computer technologies such as machine learning, information retrieval, and artificial intelligence.

Lingzhi Liao is a lecturer in Nanjing University of Information Science & Technology, China. Her current research interests include data mining, image processing, and pattern recognition.

Electronic supplementary material

Supplementary material, approximately 156 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Yang, H., Leung, H. et al. Effectiveness of exploring historical commits for developer recommendation: an empirical study. Front. Comput. Sci. 12, 528–544 (2018). https://doi.org/10.1007/s11704-016-6023-3

Download citation

Received: 12 January 2016
Accepted: 18 September 2016
Published: 07 February 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11704-016-6023-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectiveness of exploring historical commits for developer recommendation: an empirical study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Commenting source code: is it worth it for small programming tasks?

DevRec: A Developer Recommendation System for Open Source Repositories

Learning Human-Written Commit Messages to Document Code Changes

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 156 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now