Published December 8, 2023 | Version v2
Dataset Open

Gitome: A curated dataset for GitHub README-related tasks

  • 1. University of L'Aquila
  • 2. Università degli Studi dell'Aquila

Description

About 

This repository contains the source code implementation used to replicate the experimental results obtained in the submitted to the 21st International Conference on Mining Software Repositories (MSR204).

"Gitome: A curated dataset for GitHub README-related tasks"

authored by:

Claudio Di Sipio, Juri Di Rocco, Riccardo Rubei, Phuong Than Nguyen, and Davide Di Ruscio,

Università degli Studi dell'Aquila, Italy

Data description 

The dataset is structured as follows: 

  • emf_metamodel.zip: It contains the Ecore project with the Gitome data model
  • existing_dumps.zip: It contains the existing datasets used to build Gitome
  • lang_aggr_stats.csv: It contains the language data to compute the statistics presented in the paper
  • langs.csv: It contains all the languages and their frequency
  • output_dataset.zip: It contains the benchmarking dataset obtained by parsing the README files
  • repository_lists.zip: It contains the list of repositories for each considered dataset (with possible duplicates)
  • topics.csv: It contains all the topics and their frequency
  • topics_aggr_stats.csv:  It contains the topics data to compute the statistics presented in the paper
  • gitome_repo.txt: It contains the list of the URLs of the considered GitHub repositories

 

How to collect Gitome

To collect all the data stored in this archive, please refer to the supporting Github repository https://github.com/MDEGroup/Gitome-MSR2024.

 

 

Files

emf_datamodel.zip

Files (85.2 MB)

Name Size Download all
md5:a82b1b6f05340f443f3510e4f7f7ffdd
5.6 MB Preview Download
md5:9d151fe8bd63cd469213f1ae3bdccef3
24.4 MB Preview Download
md5:53f4a2fba4e1a9357cf7927cb52bdb3a
308.4 kB Preview Download
md5:acf550b22a0436b01c1abfe086adfb26
4.5 kB Preview Download
md5:b60dd24aa832a33e1ad238989eb6095d
3.9 kB Preview Download
md5:a68e5092ec4140bcfd08d41fc284f2b4
54.4 MB Preview Download
md5:94705f13ac4ea57f3c424c80cad1053a
189.9 kB Preview Download
md5:e7ba7a067f3fc48a2e47082da9689d87
183.4 kB Preview Download
md5:5aa8612879af6b194b04a84e6a185f0d
5.9 kB Preview Download