Reliability analysis and optimal version-updating for open source software

https://doi.org/10.1016/j.infsof.2011.04.005Get rights and content

Abstract

Context

Although reliability is a major concern of most open source projects, research on this problem has attracted attention only recently. In addition, the optimal version-dating for open source software considering its special properties is not yet discussed.

Objective

In this paper, the reliability analysis and optimal version-updating for open source software are studied.

Method

A modified non-homogeneous Poisson process model is developed for open source software reliability modeling and analysis. Based on this model, optimal version-updating for open source software is investigated as well. In the decision process, the rapid release strategy and the level of reliability are the two most important factors. However, they are essentially contradicting with each other. In order to consider these two conflicting factors simultaneously, a new decision model based on multi-attribute utility theory is proposed.

Results

Our models are tested on the real world data sets from two famous open source projects: Apache and GNOME. It is found that traditional software reliability models provide overestimations of the reliability of open source software. In addition, the proposed decision model can help management to make a rational decision on the optimal version-updating for open source software.

Conclusion

Empirical results reveal that the proposed model for open source software reliability can describe the failure process more accurately. Furthermore, it can be seen that the proposed decision model can assist management to appropriately determine the optimal version-update time for open source software.

Introduction

Open source software (OSS) development is a new way of building and deploying large software systems on a global basis, and it differs in many interesting ways from the principles and practices of traditional software engineering [1]. There is a widespread recognition across software industry that open source projects can produce software systems of high quality and functionality, such as Linux operating system, Apache web server, Mozilla browser and MySQL database system, that can be used by thousands to millions of end-users [2].

The OSS development is based on a relatively simple idea: the original core of the OSS system is developed locally by a single programmer or a team of programmers. Then a prototype system is released on the internet, so that other programmers can freely read, modify and redistribute that system’s source code. The evolution process of OSS is much faster than the closed source project. The reason is that in the development of OSS, tasks are completed without assigning from hierarchical management and there is no explicit system-level design, no well-defined plan or schedules. A central managing group may check the code but this process is much less rigid than in closed-source projects.

Several OSS systems have been in widespread use with thousands or millions of end-users, e.g. Mozilla, Apache, OpenOffice, Eclipse, NetBeans, GNOME, and Linux. Due to the success of OSS, more and more software companies have switched from a closed source to an open source development in order to win market share and to improve product quality [3]. Even the leading commercial software companies, such as IBM and Sun, have begun to embrace the open source model and are actively taking part in the development of OSS products.

As OSS application rapidly spreads out, it is of great importance to assess the reliability of OSS system to prevent potential financial loss or reputational damage to the company [4]. Due to this consideration, many studies have been carried out recently on predicting number of defects in the system. For instances, Eaddy et al. [5] investigated the relationship between the degree of scattering and the number of defects by stepwise regression and other statistical techniques. Marcus [6] proposed a new measure named Conceptual Cohesion of Classes (C3) to measure the cohesion in object-oriented software. They also applied C3 in logistic regression to predict software faults with the comparisons with other object-oriented metrics. Kim et al. [7] introduced a new technique for predicting latent software bugs in OSS, called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes or clean changes. In this manner, change classification predicts the existence of bugs in software changes.

Although the works above can provide important information to assess the reliability for OSS, the total number of defects in a software system is an essentially indirect reliability measurement where the time factor is often neglected. Only in some recent studies by Tamura and Yamada [8], [9], such issue is considered. In particular, Tamura and Yamada [8] combined neural network and software reliability growth modeling for the assessment of OSS reliability. In [9], the stochastic differential equation is introduced for the modeling of OSS reliability and optimal version-update time is discussed based on it.

In this paper, we will further investigate the modeling of OSS reliability and its optimal version-update time determination. Our model is based on non-homogeneous Poisson process (NHPP) which has been proven to be a successful model for software reliability [10], [11], [12], [13]. However, different from the NHPP models for closed source software and the models proposed in [8], [9], our model incorporates the unique patterns of OSS development, such as the multiple releases property and the hump-shaped fault detection rate function. In addition, because the project cost is no longer a crucial factor for optimal release time determination for most OSS projects, in this study, we formulate a new version-update time determination problem for OSS. Specifically, the multi-attribute utility theory (MAUT) is adopted for this decision process, where two important strategies are considered simultaneously: rapid release of the software to maintain sufficient volunteers involved and the acceptable level of OSS reliability.

The rest of this paper is organized as follows. Section 2 describes our proposed model based on NHPP incorporating unique properties of OSS. Section 3 formulates the optimal version-update time problem based on MAUT where the rapid release strategy and the level of reliability are considered simultaneously. Section 4 provides numerical examples for validation purpose based on the real world data sets. Conclusions are made in the last section.

Section snippets

Modeling fault detection process of open source software

The underlying software fault detection process is commonly assumed to be a non-homogeneous Poisson process (NHPP) [10], [11], [12], [13]. As software faults are detected, isolated and removed, the software being tested becomes more reliable with a decreasing failure intensity function. In general, an NHPP software reliability growth model (SRGM) can be developed by solving the following differential equation [14]:dm(t)dt=b(t)[a(t)-m(t)],where m(t), a(t) and b(t) are the mean value function of

Determination of optimal version-update time

Optimal release time determination in the testing phase is a typical application of software reliability models. The total expected cost including both testing cost and operation cost is a crucial factor for such determination [14], [15], [16]. In fact, the software cost estimation is also important in software development [17]. However, most OSS projects are interest-driven and most development activities in OSS projects are accomplished by volunteer users. Consequently, the cost is no longer

Numerical examples

Special properties of OSS are incorporated into the proposed model for open source software reliability. In order to compare the proposed model against traditional models for reliability assessment, numerical examples are provided based on two real world data sets from two famous open source projects: Apache and GNOME. Furthermore, based on the failure data from the first release of Apache, a decision model application example is provided and sensitivity analysis is introduced to help

Conclusions

The OSS approach provides a new paradigm of software development, where volunteer participation has become a critical issue. Since volunteers are interest-driven and the attractiveness of a specific release of software is generally decreasing over time, multiple releases are expected to maintain a sufficient number of volunteers and to attract new comers. In order to describe these unique properties of OSS properly, a modified NHPP model is proposed to assess OSS reliability. Based on the

References (36)

  • X. Li et al.

    Sensitivity analysis of release time of software reliability models incorporating testing effort with multiple change-points

    Applied Mathematical Modelling

    (2010)
  • E.S. Raymond

    The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary

    (2001)
  • A. Mockus et al.

    Two case studies of open source software development: Apache and Mozilla

    ACM Transactions on Software Engineering and Methodology

    (2002)
  • T. Gyimothy et al.

    Empirical validation of object-oriented metrics on open source software for fault prediction

    IEEE Transactions on Software Engineering

    (2005)
  • M. Eaddy et al.

    Do crosscutting concerns cause defects?

    IEEE Transactions on Software Engineering

    (2008)
  • A. Marcus et al.

    Using the conceptual cohesion of classes for fault prediction in object-oriented systems

    IEEE Transactions on Software Engineering

    (2008)
  • S. Kim et al.

    Classifying software changes: clean or buggy?

    IEEE Transactions on Software Engineering

    (2008)
  • Y. Tamura et al.

    A component-oriented reliability assessment method for open source software

    International Journal of Reliability, Quality and Safety Engineering

    (2008)
  • View full text