ABSTRACT
To form a training set for a source-code change prediction model, e.g., using the association rule mining or machine learning techniques, commits from the source code history are needed. The traceability between releases and commits would facilitate a systematic choice of history in units of the project evolution scale (i.e., commits that constitute a software release). For example, the major release 25.0 in Chrome is mapped to the earliest revision 157687 and latest revision 165096 in the trunk. Using this traceability, an empirical study is reported on the frequency distribution of file changes for different release windows. In Chrome, the majority (50%) of the committed files change only once between a pair of consecutive releases. This trend is reversed after expanding the window size to at least 10. That is, the majority (50%) of the files change multiple times when commits constituting 10 or greater releases are considered. These results suggest that a training set of at least 10 releases is needed to provide a prediction coverage for majority of the files.
- T. Zimmermann, A. Zeller, P. Weissgerber, and S. Diehl. Mining version histories to guide software changes. Software Engineering, IEEE Transactions on, 31(6):429--445, 2005. Google ScholarDigital Library
- A.T.T. Ying, G.C. Murphy, R. Ng, and M.C. Chu-Carroll. Predicting source code changes by mining change history. Software Engineering, IEEE Transactions on, 30(9):574--586, 2004. Google ScholarDigital Library
- Sunghun Kim, E.J. Whitehead, and Yi Zhang. Classifying software changes: Clean or buggy? Software Engineering, IEEE Transactions on, 34(2):181--196, 2008. Google ScholarDigital Library
- Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. Predicting faults from cached history. In Proceedings of the 29th International Conference on Software Engineering, ICSE '07, pages 489--498, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- G. Canfora and L. Cerulo. Impact analysis by mining software and change request repositories. In Software Metrics, 2005. 11th IEEE International Symposium, pages 9 pp.--29, 2005. Google ScholarDigital Library
- Huzefa Kagdi, Michael L. Collard, and Jonathan I. Maletic. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice, 19(2):77--131, 2007. Google ScholarDigital Library
- Joe F Shobe, Md Yasser Karim, Motahareh Bahrami Zanjani, and Huzefa H Kagdi. On mapping releases to commits in open source systems. In ICPC, pages 68--71, 2014. Google ScholarDigital Library
- L.P. Hattori and M. Lanza. On the nature of commits. In Automated Software Engineering - Workshops, 2008. ASE Workshops 2008. 23rd IEEE/ACM International Conference on, pages 63--71, 2008.Google ScholarDigital Library
- A. Alali, H. Kagdi, and J.I. Maletic. What's a typical commit? a characterization of open source software repositories. In Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on, pages 182--191, 2008. Google ScholarDigital Library
- R. S. Arnold. and S. Bohner. Software Change Impact Analysis. IEEE Computer Society Press, 1996. Google ScholarDigital Library
- Bixin Li, Xiaobing Sun, Hareton Leung, and Sai Zhang. A survey of code-based change impact analysis techniques. Software Testing, Verification and Reliability, 23(8):613--646, 2013.Google ScholarCross Ref
- H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based on product release history. In Software Maintenance, 1998. Proceedings., International Conference on, pages 190--198, 1998. Google ScholarDigital Library
- M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In Software Maintenance, 2003. ICSM 2003. Proceedings. International Conference on, pages 23--32, 2003. Google ScholarDigital Library
- Guowu Xie, Jianbo Chen, and I. Neamtiu. Towards a better understanding of software evolution: An empirical study on open source software. In Software Maintenance, 2009. ICSM 2009. IEEE International Conference on, pages 51--60, 2009.Google ScholarCross Ref
- H. Kagdi, M. Gethers, and D. Poshyvanyk. Se2 model to support software evolution. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pages 512--515, 2011. Google ScholarDigital Library
- M. Mattsson and J. Bosch. Observations on the evolution of an industrial oo framework. In Software Maintenance, 1999. (ICSM '99) Proceedings. IEEE International Conference on, pages 139--145, 1999. Google ScholarDigital Library
- H.M. Sneed. A cost model for software maintenance evolution. In Software Maintenance, 2004. Proceedings. 20th IEEE International Conference on, pages 264--273, 2004. Google ScholarDigital Library
- H. Gall, M. Jazayeri, R.R. Klosch, and G. Trausmuth. Software evolution observations based on product release history. In Software Maintenance, 1997. Proceedings., International Conference on, pages 160--166, 1997. Google ScholarDigital Library
- Jayalath Ekanayake, Jonas Tappolet, Harald C Gall, and Abraham Bernstein. Tracking concept drift of software projects using defect prediction quality. In Mining Software Repositories, 2009. MSR'09. 6th IEEE International Working Conference on, pages 51--60. IEEE, 2009. Google ScholarDigital Library
- O. Saliu and G. Ruhe. Supporting software release planning decisions for evolving systems. In Software Engineering Workshop, 2005. 29th Annual IEEE/NASA, pages 14--26, 2005. Google ScholarDigital Library
- A. Hindle, D.M. German, M.W. Godfrey, and R.C. Holt. Automatic classication of large changes into maintenance categories. In Program Comprehension, 2009. ICPC '09. IEEE 17th International Conference on, pages 30--39, 2009.Google ScholarCross Ref
- Abram Hindle, Daniel M. German, and Ric Holt. What do large commits tell us?: A taxonomical study of large commits. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR '08, pages 99--108, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A.E. Hassan and R.C. Holt. Studying the chaos of code development. In Reverse Engineering, 2003. WCRE 2003. Proceedings. 10th Working Conference on, pages 123--133, 2003. Google ScholarDigital Library
- Audris Mockus and David M. Weiss. Predicting risk of software changes. Bell Labs Technical Journal, 5(2):169--180, 2000.Google Scholar
- Mina Askari and Ric Holt. Information theoretic evaluation of change prediction models for large-scale software. In Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR '06, pages 126--132, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- Stefan Winkler and Jens Pilgrim. A survey of traceability in requirements engineering and model-driven development. Softw. Syst. Model., 9(4):529--565, September 2010. Google ScholarDigital Library
Index Terms
How Often does a Source Code Unit Change within a Release Window?
Recommendations
On mapping releases to commits in open source systems
ICPC 2014: Proceedings of the 22nd International Conference on Program ComprehensionThe paper presents an empirical study on the release naming and structure in three open source projects: Google Chrome, GNU gcc, and Subversion. Their commonality and variability are discussed. An approach is developed that establishes the mapping from ...
Comments