Skip to main content
Log in

Data sets describing the circle of life in Ruby hosting, 2003–2016

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Studying software repositories and hosting services can provide valuable insights into the behaviors of large groups of software developers and their projects. Traditionally, most analysis of metadata collected from software project hosting services has been conducted by specifying some short window of time, typically just a few years. To date, few - if any - studies have been built from data comprising the entirety of a hosting facility’s lifespan: from its birth to its death, and rebirth in another form. Thus, the first contribution of this paper is to present two data sets that support the historical analysis of over ten years of collected metadata from the now-defunct RubyForge project hosting site, as well as the follow-on successor to RubyForge, the RubyGems package (“gem”) hosting facility. The data sets and samples of usage demonstrated in this paper include: analyses of overall forge growth over time, presentation of data and analyses of project-level characteristics on both forges and their changes over time (for example in licenses, languages, and so on), and demonstration of how to use developer-level metadata (for example counts of new developers and calculation of developer-project density) to assess changes in person-level activity on both sites over time. Finally, because RubyForge was phased out and the gem-hosting portion of it was replaced by RubyGems, all the gems within RubyForge projects were transferred by project owners and by the site owners themselves into the RubyGems hosting facility. Thus, the data sets in this paper represent a unique opportunity to study projects as they moved from one ecosystem to another, and as such we show several methods for locating related projects between the two forges, and for building a cross-forge, longitudinal project history using information from both forges. These data sets and sample analyses in this paper will be relevant to researchers studying long-term software evolution, and distributed, hosted, or collaborative software development environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. This work is an extension of a short paper previously presented in the Data Showcase track at The International Conference on Mining Software Repositories in 2016. The full citation for that prior work can be found in Squire (2016a).

  2. http://flossmole.org

  3. http://flossdata.syr.edu/data/rf and http://flossdata.syr.edu/data/rg

  4. http://flossmole.org/content/direct-db-access-flossmole-collection-available

  5. https://github.com/FLOSSmole/rubyforge

  6. https://sf.net

  7. https://launchpad.net

  8. https://tigris.org

  9. https://code.google.com

  10. https://codeplex.com

  11. https://alioth.debian.org

  12. http://forge.ow2.org

References

Download references

Acknowledgements

We gratefully acknowledge the National Science Foundation (grant number NSF-14-05643) for supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Megan Squire.

Appendix SQL code used to generate figures and tables

Appendix SQL code used to generate figures and tables

Listing A

SQL to create Fig. 3. Cumulative project count over the lifespan of RubyForge, by month and year.

figure b

Listing B

SQL for Fig. 4. New project registration dates for all projects in RubyForge, by month and year. Use final RubyForge data collection from 2013 (#12987).

figure c

Listing C

SQL for Fig. 5. Creation dates for gems in RubyGems, by month and year

figure d

Listing D

SQL used to generate 2006 data (#24) for Table 4

SQL used to generate 2013 data (#12987) for Table 4

Listing E

SQL used to generate 2006 data (#24) for Table 9.

SQL used to generate 2013 (#12987) data for Table 9

Listing F

SQL code to get total number of projects for latest RubyGems collection (July 2016, #61243).

Get the count of those that have no license (include self-typing “none”, etc).

Get the count for each license - retain the top ten after combining spelling abnormalities, for Table 11.

Listing G

Generate project count per developer for final RubyForge collection (2013).

How many developers worked on more than 10 projects?

Listing H.

Find the number of gems per owner.

Find the number of gems per author

Listing I

Write a query to get a list of RubyForge and RubyGems projects that match on URL, populate entities table. The names and URLs are drawn from two views called rf_entities and rg_entities.

figure o

Write a query to get a list of RubyForge and RubyGems projects that match on name, populate entities table (if that pair is not already in entities table)

figure p

Listing J

A GHTorrent query to show the number of new GitHub projects, with and without forks, over time, as discussed in Section 1.

FLOSSmole queries to show the total number of projects hosted on CodePlex and Google Code, the two next-largest code forges, aside from GitHub.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Squire, M. Data sets describing the circle of life in Ruby hosting, 2003–2016. Empir Software Eng 23, 1123–1152 (2018). https://doi.org/10.1007/s10664-017-9581-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9581-6

Keywords

Navigation