Measuring the health of open source software ecosystems: Beyond the scope of project health
Introduction
“Ruby or Python?” “SugarCRM or a closed-source competitor?” “Drupal or Joomla?” “RedHat or Ubuntu?” These are questions often asked by developers, professionals, entrepreneurs, architects, and stakeholders related to software producing organizations. Choosing between different ecosystems is a complex task and such a decision will be determining many of the future developments within an organization. At present the only way to answer such a question is by doing sufficient reading, asking around, and finding out what the risks are of choosing to enter an ecosystem. One indicator of whether an ecosystem is alive or not can be determined by looking at the health of the keystone project, for instance by looking at the activity surrounding the Ubuntu project. Such activity consists of commits, recent releases, fixes, number of downloads, response times in forums and bug trackers, activity on e-mail lists, and contributions from non-developers. However, project health ≠ ecosystem health.
Ecosystem health is operationalized in this work by taking a combined view at a keystone project and its surrounding projects. This work stands on the shoulders of two relevant contributions in the field of ecosystem health measurement. First, the work by Crowston et al. [3], who have provided a first operationalization of open source software project health, is used to establish health factors on the project level. Their work is also fundamental to OSSMole,1 a collection of meta-data about projects in some of the main repositories, like Github and SourceForge. Secondly, the work of den Hartigh et al. [6], where an operationalization of health measurement of a commercial ecosystem is provided, is followed as closely as possible.
Software ecosystems are sets of actors functioning as a unit and interacting with a shared market for software and services, together with the relationships among them [15]. A healthy unit should thus express qualities typically associated with health: liveliness, activity, longevity, etc. For this work, we take a simple definition for software ecosystem health: longevity and a propensity for growth [19]. The definition is only the first step, as both longevity and propensity for growth can be operationalized in different ways with a plethora of different metrics.
There is a distinct need for an Open Source Ecosystem Health Operationalization (OSEHO). Manikas and Hansen [23] recently published a call to action for the creation of such an operationalization, and laid the groundwork for it. Also, in our research agenda for software ecosystems [17], we call for more research into ecosystem health. Others have attempted to create their own operationalization, but these typically get stuck in the concept phase [31], [3], [30]. In this article, an OSEHO is provided and evaluated using four research projects into open source ecosystem health.
We continue this work with a description of the literature on health measurement in ecosystems and open source projects. Section 3 discusses the creation of the OSEHO and its evaluation challenges. In Section 4, the OSEHO that provides methods for measuring health of open source software ecosystems is presented, consisting of a generic ecosystem health model and a set of methods for analyzing open source ecosystem health. In Section 5 four research projects are presented that apply parts of the model in practice. Furthermore, an analysis of the research projects and their aims (provide insight mostly), the indicators most frequently used (active developers, projects), and the research methods applied (mining repositories, web scraping) are presented. Section 6 presents a set of challenges that are met when applying the model and that were found in the four research projects, mostly having to do with data selection, preparation, and analysis. The article ends with a discussion on the applicability of an OSEHO and a summary of the conclusions and future research challenges.
Section snippets
Literature about ecosystem health
There is surprisingly little literature available about open source ecosystem health. Different perceptions exist and frequently ecosystem and project health are used interchangeably, such as in the work of Lundell et al. [20], who discuss open source ecosystems as being equal to one project. In the continued work of Gamalielsson et al. [9], [8], the responsiveness of developers on the mailing list of the Nagios community is measured as an indicator for open source community health, but does
Research approach
The goal of this research is to provide a comprehensive overview of the health metrics that can be used to determine the health of an open source ecosystem. It does so by creating an inventory of all metrics mentioned in literature that could potentially indicate ecosystem health and then placing these metrics in a framework. The framework can be applied by researchers who aim to reach a goal associated with ecosystem health, such as improve activity in an ecosystem, evaluate the health of one
Open Source Ecosystem Health Operationalization (OSEHO)
Fig. 1 represents the OSEHO. The framework is built up out of three pillars, being the productivity, robustness, and niche creation pillars, which are addressed in the discussion of the literature in Section 2. The pillars are separated into three layers, being the theory level, the network level, and the project level. At the top level is displayed what the theoretical model of Den Hartigh prescribes to use as guidelines for operationalizing the health concept, which in turn is inspired by
Analysis of the research projects
Four research projects have been selected to illustrate the use of the OSEHO. The selection criteria have been listed in Section 3.
The first project applies ecosystem health metrics to determine how healthy the ecosystems surrounding commercial Platform as a Service providers are [19]. The goal was to provide stakeholders in these ecosystem with insight into their ecosystem development and the most important metrics that indicate success in these ecosystems. The data source was GitHub and
Repository mining research challenges
The research challenges from the projects are listed in Table 1 and are collected and summarized to form common research challenges into a research agenda. Each of the terms in bold can be considered a challenge for any new ecosystem (health) study that involves repository mining. The challenges are split into data selection challenges and data preparation and analysis challenges.
Discussion
The framework is evaluated using the research projects described in the previous sections. There are currently few works on ecosystem health available and the selection of just four research projects is somewhat meager. As these research projects do not fully cover the metrics in the framework, the work cannot be considered completely evaluated. The OSEHO can be further evaluated in the future with more projects that study ecosystem health. The framework, however, is the most complete framework
References (31)
- et al.
Shades of gray: opening up a software producing organization with the open software enterprise model
J. Syst. Software
(2012) - et al.
A framework for software ecosystem governance
- et al.
A systematic mapping study on software ecosystems through a three-dimensional perspective
- et al.
Information systems success in free and open source software development: theory and measures
Software Process: Improvement and Practice
(2006) - et al.
Open source software projects as virtual organisations: competency rallying for software development
IEEE Software
(2002) Staying Power: Six Enduring Principles for Managing Strategy and Innovation in an Uncertain World (Lessons from Microsoft, Apple, Intel, Google, Toyota and More)
(2012)- E. den Hartigh, M. Tol, W. Visscher, The health measurement of a business ecosystem, in: S. Jansen, M. Cusumano, S....
- et al.
Guiding principles of natural ecosystems and their applicability to software ecosystems
- et al.
The nagios community: an extended quantitative analysis
- J. Gamalielsson, B. Lundell, B. Lings, Responsiveness as a measure for assessing the health of oss ecosystems, in:...
The ghtorent dataset and tool suite
Python: characteristics identification of a free open source software ecosystem
The Keystone Advantage: What the New Dynamics of Business Ecosystems Mean for Strategy, Innovation, and Sustainability
Strategy as ecology
Harvard Business Review
Cited by (132)
METHODS: A meta-path-based method for heterogeneous community detection in the open source software ecosystem
2023, Information and Software TechnologyComprehensive assessment of open source software ecosystem health
2023, Internet of Things (Netherlands)Indicators for innovation ecosystem health: A Delphi study
2023, Journal of Business ResearchA method for identifying references between projects in GitHub
2022, Science of Computer ProgrammingExploring factors and metrics to select open source software components for integration: An empirical study
2022, Journal of Systems and Software