Code churn estimation using organisational and code metrics: An experimental comparison

https://doi.org/10.1016/j.infsof.2011.09.004Get rights and content

Abstract

Context

Source code revision control systems contain vast amounts of data that can be exploited for various purposes. For example, the data can be used as a base for estimating future code maintenance effort in order to plan software maintenance activities. Previous work has extensively studied the use of metrics extracted from object-oriented source code to estimate future coding effort. In comparison, the use of other types of metrics for this purpose has received significantly less attention.

Objective

This paper applies machine learning techniques to unveil predictors of yearly cumulative code churn of software projects on the basis of metrics extracted from revision control systems.

Method

The study is based on a collection of object-oriented code metrics, XML code metrics, and organisational metrics. Several models are constructed with different subsets of these metrics. The predictive power of these models is analysed based on a dataset extracted from eight open-source projects.

Results

The study shows that a code churn estimation model built purely with organisational metrics is superior to one built purely with code metrics. However, a combined model provides the highest predictive power.

Conclusion

The results suggest that code metrics in general, and XML metrics in particular, are complementary to organisational metrics for the purpose of estimating code churn.

Introduction

Accurately estimating future code maintenance effort of a software system is one of the keystones of software project planning [1]. Over the past decades, significant attention has been paid to estimating code maintenance effort based on object-oriented code metrics [2], [3], [4]. In the meantime, the eXtensible Markup Language (XML) has grown into an ubiquitous language in contemporary software projects. In a separate study [5], we found that in the context of open-source software development, XML files frequently co-evolve with other types of files – about 20% of changes to non-XML files were accompanied with changes to XML files. Furthermore, the widespread adoption of revision control systems in software projects has made it possible to readily extract a wealth of metrics pertaining to organisational aspects of software projects.

These trends have opened the possibility of building more accurate models for code maintenance prediction by making use of a wider set of metrics than traditional object-oriented code metrics. Accordingly, this paper undertakes to study the relative performance and potential complementarity of different categories of metrics in the context of code maintenance effort estimation.

A common indirect measure of code maintenance effort, which we adopt in this paper, is that of code churn: the sum of number of lines of code added, modified and deleted between two revisions of a software module [6]. In addition to providing an indicator of code maintenance effort, code churn has also been shown to be correlated with software defects [7], [8]. The present study focuses on estimating long-term code churn. Specifically, the dependent variable of our study is the cumulative yearly code churn of a project: the sum of the cumulative yearly code churn of all (non-binary) files in a project, where the cumulative code churn of a file is the sum of the code churn of the file across all its revisions in a given 12-months period.

This paper specifically addresses the following research question: What is the relative performance of models built to estimate the yearly cumulative code churn of a project based on:

  • 1.

    XML/XSLT code metrics.

  • 2.

    Imperative/object-oriented programming language metrics.

  • 3.

    Organisational metrics.

  • 4.

    Organisational metrics and code metrics combined.

Given these questions, we have a choice between hypothesising that certain relations exist between a selected set of input metrics and code churn, or to uncover such relations in an exploratory manner. In the first approach we would start with a set of hypotheses, and we would use statistical conformance testing to validate these hypotheses on the chosen dataset. However, as mentioned above, we are not aware of previous studies on possible relations between XML metrics and code churn. Hence, there is little basis for formulating a priori hypotheses about such relations. On the other hand, there is a wealth of data available from open source repositories that can be leveraged in order to uncover such relations. Accordingly, we adopt a bottom-up approach based on data mining and exploratory data analysis.

The adopted data mining approach comprises the following steps:

  • 1.

    Data pre-processing: choice of prediction targets and proposition of input features (attributes that might influence the value we need to predict), data gathering, normalisation, and cleansing.

  • 2.

    Learning: choice of data mining algorithms and application of these algorithms.

  • 3.

    Results validation: evaluation of models fit using standard statistical techniques.

This data mining approach allows us to identify interference between input features and the prediction target and in doing so they uncover the existence of a predictive model. However, the data mining approach itself does not allow us to explain the cause of the interference. In order to compensate for this shortcoming, exploratory data analysis was used in this study in order to gain an understanding of the models created by the data mining algorithms.

Compared to a conformance testing approach, data mining and exploratory data analysis offer the benefit of not requiring an a priori specific model to test. The aim of exploratory data analysis is to propose models that can then be conformance-tested. Moreover, data mining and exploratory data analysis can uncover non-intuitive relations. In fact, the results of this study show, among other things, that there are no generally-applicable straightforward models to estimate coding effort – that is, linear models based on one or a very small number of interactions between input features. The models with better predictive power uncovered in the study involve a non-trivial number of interactions between input features.

The paper is structured as follows. Sections 2 Dataset, 3 Features and training algorithms introduce the dataset used in the study and the features extracted for training. The training algorithms and process employed are discussed in Section 3. Next, evaluation results are presented in Section 4. In Section 5, related work is discussed. Finally, conclusions and directions for future work are discussed in Section 6.

Section snippets

Dataset

Eight Open Source Software Project repositories were mined for data (see Table 1). These projects were chosen so that they would represent different types of software products: WSO2 is an enterprise application platform; docbook is a documentation formatting tool; eXist is a database management system; Dia is a drawing tool; Groovy and Valgrind are software development tools. The projects also differed in terms of age. WSO2 is a relatively young project with only three years of version data

Features and training algorithms

According to the research questions, we are interested in building models to predict code churn based on imperative/object-oriented code metrics, XML/XSL code metrics, and organisational metrics. Accordingly, we identified a set of metrics covering all these categories (see Table 2). The following sub-sections explain each category of metrics in turn.

Results

We trained and tested several models based on different subsets of input features, specifically, using OO code metrics only, using XML/XSL metrics only, using organisational metrics only, and using combinations of these subsets of features. Each model was validated using cross-validation with 7:1 splits by projects. In other words, we selected one of the eight projects for testing, and used the remaining seven projects for training the model, then took another project for testing and used the

Related work

Nagappan et al. used metrics from organisational structure to identify failure-prone binaries with great success [11]. They used more organisational information than available from version control systems. For example, information about not committing organisation members and organisational hierarchy was used. Their results also showed that code churn is the second best predictor of failure-prone binaries.

One of the earliest attempts to estimate code churn was made by Khoshgoftaar et al. [12].

Conclusions and future work

We have shown that, in the context of projects rich in XML and XSLT, organisational metrics provide a better basis for predicting long-term code churn for individual files than code metrics. In fact, models trained on organisational metrics were clearly superior to other models considered in the study. Organisational models give excellent predictions with low error (error below 3400 LOC/year for 95% of the cases). These models can aid in planning the projects by providing insights into the

Acknowledgements

This research was started during a visit of the first author to the Software Engineering Group at University of Zurich (visit funded by the ESF DoRa 6 Program). We thank Harald Gall and the members of his group for their valuable advice. The work is also funded by ERDF via the Estonian Centre of Excellence in Computer Science.

References (16)

There are more references available in the full text version of this article.

Cited by (20)

View all citing articles on Scopus
View full text