Code churn estimation using organisational and code metrics: An experimental comparison

doi:10.1016/j.infsof.2011.09.004

Information and Software Technology

Volume 54, Issue 2, February 2012, Pages 203-211

https://doi.org/10.1016/j.infsof.2011.09.004 Get rights and content

Abstract

Context

Source code revision control systems contain vast amounts of data that can be exploited for various purposes. For example, the data can be used as a base for estimating future code maintenance effort in order to plan software maintenance activities. Previous work has extensively studied the use of metrics extracted from object-oriented source code to estimate future coding effort. In comparison, the use of other types of metrics for this purpose has received significantly less attention.

Objective

This paper applies machine learning techniques to unveil predictors of yearly cumulative code churn of software projects on the basis of metrics extracted from revision control systems.

Method

The study is based on a collection of object-oriented code metrics, XML code metrics, and organisational metrics. Several models are constructed with different subsets of these metrics. The predictive power of these models is analysed based on a dataset extracted from eight open-source projects.

Results

The study shows that a code churn estimation model built purely with organisational metrics is superior to one built purely with code metrics. However, a combined model provides the highest predictive power.

Conclusion

The results suggest that code metrics in general, and XML metrics in particular, are complementary to organisational metrics for the purpose of estimating code churn.

Introduction

Accurately estimating future code maintenance effort of a software system is one of the keystones of software project planning [1]. Over the past decades, significant attention has been paid to estimating code maintenance effort based on object-oriented code metrics [2], [3], [4]. In the meantime, the eXtensible Markup Language (XML) has grown into an ubiquitous language in contemporary software projects. In a separate study [5], we found that in the context of open-source software development, XML files frequently co-evolve with other types of files – about 20% of changes to non-XML files were accompanied with changes to XML files. Furthermore, the widespread adoption of revision control systems in software projects has made it possible to readily extract a wealth of metrics pertaining to organisational aspects of software projects.

These trends have opened the possibility of building more accurate models for code maintenance prediction by making use of a wider set of metrics than traditional object-oriented code metrics. Accordingly, this paper undertakes to study the relative performance and potential complementarity of different categories of metrics in the context of code maintenance effort estimation.

A common indirect measure of code maintenance effort, which we adopt in this paper, is that of code churn: the sum of number of lines of code added, modified and deleted between two revisions of a software module [6]. In addition to providing an indicator of code maintenance effort, code churn has also been shown to be correlated with software defects [7], [8]. The present study focuses on estimating long-term code churn. Specifically, the dependent variable of our study is the cumulative yearly code churn of a project: the sum of the cumulative yearly code churn of all (non-binary) files in a project, where the cumulative code churn of a file is the sum of the code churn of the file across all its revisions in a given 12-months period.

This paper specifically addresses the following research question: What is the relative performance of models built to estimate the yearly cumulative code churn of a project based on:

1.
XML/XSLT code metrics.
2.
Imperative/object-oriented programming language metrics.
3.
Organisational metrics.
4.
Organisational metrics and code metrics combined.

Given these questions, we have a choice between hypothesising that certain relations exist between a selected set of input metrics and code churn, or to uncover such relations in an exploratory manner. In the first approach we would start with a set of hypotheses, and we would use statistical conformance testing to validate these hypotheses on the chosen dataset. However, as mentioned above, we are not aware of previous studies on possible relations between XML metrics and code churn. Hence, there is little basis for formulating a priori hypotheses about such relations. On the other hand, there is a wealth of data available from open source repositories that can be leveraged in order to uncover such relations. Accordingly, we adopt a bottom-up approach based on data mining and exploratory data analysis.

The adopted data mining approach comprises the following steps:

1.
Data pre-processing: choice of prediction targets and proposition of input features (attributes that might influence the value we need to predict), data gathering, normalisation, and cleansing.
2.
Learning: choice of data mining algorithms and application of these algorithms.
3.
Results validation: evaluation of models fit using standard statistical techniques.

This data mining approach allows us to identify interference between input features and the prediction target and in doing so they uncover the existence of a predictive model. However, the data mining approach itself does not allow us to explain the cause of the interference. In order to compensate for this shortcoming, exploratory data analysis was used in this study in order to gain an understanding of the models created by the data mining algorithms.

Compared to a conformance testing approach, data mining and exploratory data analysis offer the benefit of not requiring an a priori specific model to test. The aim of exploratory data analysis is to propose models that can then be conformance-tested. Moreover, data mining and exploratory data analysis can uncover non-intuitive relations. In fact, the results of this study show, among other things, that there are no generally-applicable straightforward models to estimate coding effort – that is, linear models based on one or a very small number of interactions between input features. The models with better predictive power uncovered in the study involve a non-trivial number of interactions between input features.

The paper is structured as follows. Sections 2 Dataset, 3 Features and training algorithms introduce the dataset used in the study and the features extracted for training. The training algorithms and process employed are discussed in Section 3. Next, evaluation results are presented in Section 4. In Section 5, related work is discussed. Finally, conclusions and directions for future work are discussed in Section 6.

Section snippets

Dataset

Eight Open Source Software Project repositories were mined for data (see Table 1). These projects were chosen so that they would represent different types of software products: WSO2 is an enterprise application platform; docbook is a documentation formatting tool; eXist is a database management system; Dia is a drawing tool; Groovy and Valgrind are software development tools. The projects also differed in terms of age. WSO2 is a relatively young project with only three years of version data

Features and training algorithms

According to the research questions, we are interested in building models to predict code churn based on imperative/object-oriented code metrics, XML/XSL code metrics, and organisational metrics. Accordingly, we identified a set of metrics covering all these categories (see Table 2). The following sub-sections explain each category of metrics in turn.

Results

We trained and tested several models based on different subsets of input features, specifically, using OO code metrics only, using XML/XSL metrics only, using organisational metrics only, and using combinations of these subsets of features. Each model was validated using cross-validation with 7:1 splits by projects. In other words, we selected one of the eight projects for testing, and used the remaining seven projects for training the model, then took another project for testing and used the

Related work

Nagappan et al. used metrics from organisational structure to identify failure-prone binaries with great success [11]. They used more organisational information than available from version control systems. For example, information about not committing organisation members and organisational hierarchy was used. Their results also showed that code churn is the second best predictor of failure-prone binaries.

One of the earliest attempts to estimate code churn was made by Khoshgoftaar et al. [12].

Conclusions and future work

We have shown that, in the context of projects rich in XML and XSLT, organisational metrics provide a better basis for predicting long-term code churn for individual files than code metrics. In fact, models trained on organisational metrics were clearly superior to other models considered in the study. Organisational models give excellent predictions with low error (error below 3400 LOC/year for 95% of the cases). These models can aid in planning the projects by providing insights into the

Acknowledgements

This research was started during a visit of the first author to the Software Engineering Group at University of Zurich (visit funded by the ESF DoRa 6 Program). We thank Harald Gall and the members of his group for their valuable advice. The work is also funded by ERDF via the Estonian Centre of Excellence in Computer Science.

References (16)

W. Li et al.
Object-oriented metrics which predict maintainability
Journal of Systems and Software
(1993)
M.M.T. Thwin et al.
Application of neural networks for software quality prediction using object-oriented metrics
Journal of Systems and Software
(2005)
G.A. Hall et al.
Software evolution: code delta and code churn
Journal of Systems and Software
(2000)
C. van Koten et al.
An application of bayesian network for predicting object-oriented software maintainability
Information & Software Technology
(2006)
B.W. Boehm et al.
Software development cost estimation approaches – a survey
Annals of Software Engineering
(2000)
Y. Zhou et al.
Predicting the maintainability of open source software using design metrics
Wuhan University Journal of Natural Sciences
(2008)
S. Karus, H. Gall, A study of language usage evolution in open source software, in: Proceeding of the Eighth Working...
N. Nagappan et al.
Use of relative code churn measures to predict system defect density

There are more references available in the full text version of this article.

Cited by (20)

Technical debt forecasting: An empirical study on open-source repositories
2020, Journal of Systems and Software
Technical debt (TD) is commonly used to indicate additional costs caused by quality compromises that can yield short-term benefits in the software development process, but may negatively affect the long-term quality of software products. Predicting the future value of TD could facilitate decision-making tasks regarding software maintenance and assist developers and project managers in taking proactive actions regarding TD repayment. However, no notable contributions exist in the field of TD forecasting, indicating that it is a scarcely investigated field. To this end, in the present paper, we empirically evaluate the ability of machine learning (ML) methods to model and predict TD evolution. More specifically, an extensive study is conducted, based on a dataset that we constructed by obtaining weekly snapshots of fifteen open source software projects over three years and using two popular static analysis tools to extract software-related metrics that can act as TD predictors. Subsequently, based on the identified TD predictors, a set of TD forecasting models are produced using popular ML algorithms and validated for various forecasting horizons. The results of our analysis indicate that linear Regularization models are able to fit and provide meaningful forecasts of TD evolution for shorter forecasting horizons, while the non-linear Random Forest regression performs better than the linear models for longer forecasting horizons. In most of the cases, the future TD value is captured with a sufficient level of accuracy. These models can be used to facilitate planning for software evolution budget and time allocation. The approach presented in this paper provides a basis for predictive TD analysis, suitable for projects with a relatively long history. To the best of our knowledge, this is the first study that investigates the feasibility of using ML models for forecasting TD.
Systematic Review of Machine Learning-Based Open-Source Software Maintenance Effort Estimation
2023, Recent Advances in Computer Science and Communications
Mining Task-Specific Lines of Code Counters
2023, IEEE Access
OPEN SOURCE SOFTWARE MAINTENANCE EFFORT ESTIMATION: A SYSTEMATIC MAPPING STUDY
2022, Journal of Engineering Science and Technology
In Search of Socio-Technical Congruence: A Large-Scale Longitudinal Study
2022, IEEE Transactions on Software Engineering
Maintenance Effort Estimation for Open Source Software: Current trends
2022, CEUR Workshop Proceedings

View all citing articles on Scopus

View full text

Code churn estimation using organisational and code metrics: An experimental comparison

Abstract

Context

Objective

Method

Results

Conclusion

Introduction

Section snippets

Dataset

Features and training algorithms

Results

Related work

Conclusions and future work

Acknowledgements

Journal of Systems and Software

Journal of Systems and Software

Journal of Systems and Software

Information & Software Technology

Software development cost estimation approaches – a survey

Annals of Software Engineering

Predicting the maintainability of open source software using design metrics

Wuhan University Journal of Natural Sciences

Use of relative code churn measures to predict system defect density